Categorizing and Predicting Discrete Data

In this article, I’m going to introduce a set of algorithms that can form categorizations on, and predict, discrete data comprised of sets. Their efficiency appears to be unprecedented: running on an iMac, they categorized a dataset of 200,000 vectors, with a dimension of 100 items per vector, in 13 seconds, with 100% accuracy.

I’m in the process of looking for a more formal dataset (e.g, Netflix preferences) to test them on, but nonetheless, using my own datasets, the results are impressive.

The code is available on my research gate blog.

There’s already another algorithm in my code archive that can form categorizations on datasets that consist of discrete sets, (see, “optimize_categories_intersection_N”). In terms of how it operates, this “intersection algorithm” is no different than the Euclidean algorithm that I introduced in my main research paper on AI, but the intersection algorithm uses discrete set intersection, rather than the Euclidean norm operator, to compare the difference between two elements of a dataset. That is, the Euclidean algorithm assumes the dataset is comprised of vectors, and so it compares two elements by taking the norm of the difference between those two elements. In contrast, the intersection algorithm compares two elements by treating each element as a set, and taking the intersection of those two sets. For example, if A = \{1, 2, 3\} and B = \{2, 3, 4\}, then the “distance” measure on those two sets would be |A \cap B| =| \{2, 3\}| = 2.

However, the “optimize_categories_intersection_N” algorithm makes use of a novel form of intersection that I’ve developed called the \delta-intersection operator that allows for the intersection between sets that contain vectors that are close in value, as opposed to exactly the same symbol. So, for example, the \delta-intersection operator allows us to calculate the intersection between sets A = \{1.001, 2.0002\} and B = \{1.004, 2.002, 3.0005\}. If \delta = .1, then |A \cap_{\delta} B| = 2, since |1.001 - 1.004| < \delta and |2.0002 - 2.002| < \delta. I developed this operator in the context of color processing, where you might want to take the intersection count between two sets of colors, and say close enough is good enough to constitute a match. For an explanation of how we determine the appropriate value of \delta, see my research note on vectorized image partitioning.

In this article, I’m going to present an otherwise identical categorization algorithm that makes use of ordinary intersection. But unlike the vectorized implementation for the \delta-intersection operator, which I’m quite proud of, the implementation of the intersection operator used in the algorithm I’ll present in this note is trivial and almost certainly widely used, which I’ll nonetheless explain below. As a result, in terms of code, there’s really nothing new in this algorithm when compared to the original categorization and prediction algorithms that I discuss in my main research paper on AI.

However, as a theoretical matter, using the intersection operator creates a completely different measure on the dataset when compared to Euclidean distance, since in Euclidean space, two elements are similar if they’re near each other when represented as points, which implies that the norm of their difference will be small. In contrast, if two sets are similar, then they have a lot of elements in common, which will produce a large intersection count. So even though the Euclidean categorization algorithm and intersection algorithm both operate in the exact same manner, using almost identical code, as a theoretical matter, the intersection algorithm is worth discussing, since it generates some interesting mathematics if we think about the intersection operator as a measure of similarity, or distance, on sets.

Moreover, because of the way these algorithms work, we need to have an expectation value for the intersection count between any two sets. We can of course actually calculate the intersection count between every pair of sets in a given dataset and take the average, but this requires O( {{N} \choose {2}} ) calculations, where N is the number of items in the dataset. The calculation of ordinary intersection can be vectorized trivially, so this is by no means intractable. But, using set theory, we can approximate this number using O(N) calculations, which on a low-end device, could mean the difference between an algorithm that runs in a few seconds, and one that runs in a few minutes.

Measuring the Difference Between Sets

There is no obvious geometric space in which we can visualize sets that would allow us to say, for example, that two sets are “close” to each other. This implies that the ordinary intuition for categorization, which comes from clustering points that are near each other in space, doesn’t work for categorizing sets. Instead, the intersection algorithm relies upon the intersection count to measure similarity between sets, with no notions of distance at all.

Consider a dataset that consists of consumer purchases at a local supermarket, where each element of the dataset represents the items purchased by a particular consumer. This type of data doesn’t need a location in Euclidean space to be represented. Instead, all we need is a set of distinct but otherwise arbitrary symbols that are sufficient in number to represent the items in question.  So if the supermarket sells 20 items, then we could use the names of the actual products (assuming they’re distinct), the numbers 1 through 20, the letters A through T, \log(20) bits, or any other set of 20 distinct symbols, or any other system that can be in 20 distinct states.

Assume that one of the elements is A = \{apple, banana, cereal\}. Any dataset that consists of sets like A would be a family of sets, and as a result, we can compare elements within the dataset using the set intersection operator. For example, if B = \{apple, banana, bread\}, then |A \cap B| = 2, whereas if C = \{apple, orange, tomato\}, then |A \cap C| = 1, and so we can reasonably say that A and B are more similar to each other than A and C. In general, the larger the intersection count is between two sets, the more similar the two sets are.

If a dataset consists of N individual sets, each of which could contain any one of M items, then we can represent the dataset as a binary matrix of N rows, and M columns, where if entry (i,j) = 1, then set i contains item j, and otherwise, the entries are 0.

We can then calculate the union of any two sets by simply taking the sum of their rows, and testing which entries are greater than zero. Similarly, the intersection is simply the product of corresponding entries. So if A = \{1,2,5\} and B = \{2,7\}, and there are 7 seven items in the dataset, then we would represent A as the row vector (1,1,0,0,1,0,0) and B as the row vector (0,1,0,0,0,0,1).

We can then calculate their union as,

A \cup B = ((1,1,0,0,1,0,0) + (0,1,0,0,0,0,1)) > 0

= (1,2,0,0,1,0,1) > 0

= (1,1,0,0,1,0,1) = \{1,2,5,7\}.

Similarly, we can calculate their intersection as,

A \cap B = ((1,1,0,0,1,0,0).*(0,1,0,0,0,0,1))

= (0,1,0,0,0,0,0) = \{2\}.

Note that in Matlab, the (.*) operator takes the product between the corresponding entries of two vectors.

Generating Categories

The fundamental approach used in the intersection algorithm is exactly the same as the approach used in all of my algorithms, which is to iterate through different levels of distinction, measuring the entropy of the resultant structure, and then choosing the level of distinction at which the rate of change in entropy is maximized. The intuition for this process is easier to understand in the context of image processing, which actually served as the impetus for all of these algorithms in the first instance.

Consider the image of the red brick wall below, and assume that we want to find the boundaries of the objects in the image. We could accomplish this by identifying significant changes in colors, which in turn requires answering the question of how different two colors have to be in order to distinguish between them. Intuitively, this should depend upon the context of the image. For example, if an image consists solely of very similar colors, then a small change in color could indicate a boundary. In contrast, if an image consists of wildly different colors, then it’s probably only a large change in color that indicates a boundary.

 

The solution to this problem, which underlies nearly all of my work in AI, is to pick an initial level of distinction \delta, and then gradually increase that value, measuring the rate of change in the entropy of the structure in question as a function of \delta. The point at which the entropy changes the most is the point at which the level of distinction produces the greatest change in structure. Therefore, my algorithms treat this value of \delta as the level of distinction that is best suited for analyzing an object, or dataset. In the case of boundary detection, we’d increase our level of distinction between colors, and measure the entropy of the structure of the boundaries as a function of our level of distinction \delta. As we iterate, there will be a point where the complexity of the boundaries “pops”, causing a spike in entropy, and it is at this value of \delta that the boundary detection algorithm effectively terminates.

In the case of the intersection algorithm, the goal is to generate categories of sets, grouping together sets that are sufficiently similar to warrant co-categorization. Further, our measure is an intersection count, so rather than asking whether the difference between two sets is less than some \delta, we will instead ask whether the intersection count between two sets is at least \delta. That is, the question is restated as whether two sets have enough elements in common to warrant co-categorization.

The method used by the intersection algorithm is otherwise the same as the Euclidean algorithm, in that it picks an initial anchor set from the dataset, and then iterates through the entire dataset, testing whether the intersection count between each set and that anchor set is at least \delta. If the intersection count between a given set and the anchor set is at least \delta, then the set in question is added to the category associated with that anchor set. In short, if a given set has at least \delta elements in common with the anchor set of a category, then we add that set to the category. We do this until all sets have been assigned to exactly one category, test the entropy of the resultant categorization, and then repeat this process for a new and increased minimum intersection value \delta, selecting the categorization associated with the value of \delta that produces the greatest change in entropy.

The Expected Set

We will of course need to determine the range over which the values of \delta iterate. The Euclidean algorithm uses a function of the standard deviation of the dataset as the maximum \delta, and the minimum \delta is the maximum \delta divided by the number of iterations. By analogy, I’ve used the average of the intersection counts over the entire dataset with what I call the expected set, which is a mathematical construct, and not an actual set of elements. This calculation tells us, on average, what the intersection count would be between a given set, and another set that contains a typical collection of elements from the dataset.

For example, assume that the dataset consists of the family of sets F = \{ \{1, 2, 3 \}, \{2,14 \}, \{1, 2, 5 \}, \{2, 6\} \}. The union over all of the sets in F is U = \{1, 2, 3, 5, 6, 14 \}, and every item in the union occurs once in the dataset, except for 1, which occurs twice, and 2, which occurs four times. If we randomly select a set from F, the probability of selecting a set that contains 1 is \frac{1}{2}, since 1 appears in two of the four sets. As a result, if a set contains a 1, then the expectation value of that item’s contribution to the intersection count between that set and a random set chosen from F is \frac{1}{2}. If a set contains a 2, then the expectation value of that item’s contribution to the intersection count is 1, since 2 occurs in every set.

More generally, we can take the intersection between a given set and the union over F, and weight each item by the weights just described, where the weight of an item is the number of instances of that item in the dataset, divided by the number of sets in the dataset. Returning to the example above, the union over F is   U = \{1, 2, 3, 5, 6, 14 \}, and the corresponding weights are W = \{ \frac{1}{2}, 1, \frac{1}{4}, \frac{1}{4}, \frac{1}{4}, \frac{1}{4} \}. The intersection count between A = \{1,2\} and the expected set over F is the intersection count between A and U, as weighted by the elements of W, which is in this case \frac{1}{2} + 1 =  \frac{3}{2}.

Formally, the intersection count between a given set and the expected set is really a function on the elements generated by the ordinary intersection between that set and the union over the applicable family. For example, first we calculate the intersection between A and U, which is the set \{1, 2\}. Then, we apply a function that maps the elements of that set to the elements of W, which are in this case \{ \frac{1}{2}, 1 \}. Then, we take the sum over that set, which I’ve defined as the intersection count between the set in question and the expected set, which is in this case \frac{3}{2}.

If we actually take the intersection count for every pair of sets in F, and then take the average, it comes out to \frac{7}{6} = 1.1667 elements. If we calculate the average intersection with the expected set for each of the four sets, it comes out to \frac{3}{2} = 1.5 elements. On a percentage-wise basis, there is a significant difference between these two values, but nonetheless, this method allows us to generate a reasonable approximation of the average intersection count between every pair of elements with only O(N) calculations (note that we can generate the union by taking the sum over the rows of the entire dataset, which requires O(N) calculations).

So to summarize, we begin with a dataset comprised of sets, calculate the expected intersection count using the method outlined above (or actually calculate the average intersection), and then iterate \delta from the expected intersection divided by the number of iterations, up to the expected intersection, constructing categories for each value of \delta, and then select the categorization that is the result of the value of \delta that generated the greatest change in the entropy of the categorization.

Predicting Data

The steps outlined above will allow us to categorize data comprised of sets. We can also find the category of best fit for a new input set by testing the intersection count between that input set and each of the anchor sets generated by the categorization algorithm, and then selecting the category corresponding to the anchor set with which the input set had the greatest intersection count. As a practical matter, this would allow us to make predictions given a discrete set that we know to be part of a larger dataset that we’ve already categorized.

Returning to the example above, if we’ve categorized a dataset of prior consumer purchases, and a new set of purchases by a particular consumer fits best in category A, then it’s reasonable to conclude that the consumer in question might also, at some future time, consider purchasing the items contained within the other sets in category A.

But we can do better, by using the same technique that I used for the Euclidean algorithm, which is to repeatedly reapply the categorization algorithm to each of the categories generated by the first application, which will generate subcategories for each category. We keep doing this until the categorization algorithm no longer finds any further opportunities for subdivision. This process will generate a tree of subcategories, where the depth represents the number of times the categorization algorithm was applied.

We can then attempt to place a new input set into the tree, and the advantage of this approach is that the subcategories are by definition smaller than the categories within the top level categorization, which will allow us to search through a larger number of smaller categories for a match, thereby generating more accurate predictions. Because a match is determined by the intersection count, and the intersection operator is vectorized, this can be done quickly in at most O(M) calculations, where M is the number of categories in the tree, which is generally a low multiple of N, the number of items in the dataset.

Note that this method is distinguishable from testing whether the presence of one item in an input set implies that another item is likely to be in the input set (perhaps as a piece of missing information). So for example, if a consumer purchases kale, then perhaps it is the case, that as a general matter, that consumer will also probably purchase quinoa. We could look for correlations like this in a dataset of sets using vectorized processes. Specifically, recall that in this case, each row corresponds to a consumer, and each column corresponds to an item. Each row will contain a sequence of  0/1’s that will indicate what items were purchased by each consumer. We can then find, for example, all the rows where column 1 contains a 1, which corresponds to the set of consumers that purchased item 1. We can easily take the sum over any other column we’d like from that subset of rows, which will tell us how many times a consumer that bought item one also bought some other item. This will allow us to produce a vectorized measure of correlation between items in the dataset.

But that’s not what this prediction algorithm does.

Instead, this prediction algorithm takes a set of consumer purchases, and says, as a general matter, to what category does this consumer belong based upon the entirety of the consumer’s purchases. This will allow us to generate an entire set of predictions, as opposed to testing whether one particular item implies that some other item is likely.

Specifically, if an input set maps to a particular category, then all of the items in that category are reasonable predictions for the input set, and we can of course sort them by frequency of occurrence, and as a general matter, use and present these predictions in a manner that is best suited for the particular application at hand. So, for example, if we know that a particular consumer likes a particular set of musical groups, then we can generate a set of suggestions based upon the entirety of the consumer’s preferences.

We could conceivably take into account the complete profile of a consumer, across food, music, zip code, credit range, political party, etc., and then categorize consumers based upon the entirety of the data that is available about them. We then can form expectation values for any missing information based upon the statistics of the category to which the consumer is mapped by the prediction function.

Advertisements

Vectorized Image Classification

In this post, I’m going to introduce a set of algorithms that can quickly classify images. Running on an iMac, it took just 1.363 seconds per image (on average) from start to finish (including all pre-processing time) to classify the dataset of images below. Note that this is the amount of time it took per image to train the classification algorithm, as opposed to the amount of time it would have taken to classify a new image after the algorithm had already been trained. Like all of my algorithms, these algorithms have a low-degree polynomial runtime.

Vectorized Boundary Detection

In a previous paper, I introduced a set of algorithms that detect the boundaries in an image by first partitioning an image into rectangular regions, and then scanning those regions from left to right, and top to bottom, and marking the locations of any likely boundaries based upon changes in color. How significant of a change in color is necessary to mark a boundary is discussed in that paper.

At a high level, the only difference between the algorithm I’ll present in this note (the code for which I’ve already posted here) and the algorithm I previously presented, is that this algorithm relies only upon the average color of each region, whereas the previous algorithm sampled about 100 colors from each region. As a result, this algorithm makes use of far less information, and is therefore much faster. However, because it is much faster, it can make use of a more fine-grained partition, which actually increases accuracy when compared to the previous version. The images below show a photograph of a wall that I took in Williamsburg, Brooklyn, together with the boundaries detected by the algorithm, which are drawn in black.

Image Classification

The first step of the classification process is to run the boundary detection algorithm on an image. This will produce two binary matrices: one representing the left to right boundaries, and another representing the top to bottom boundaries. If the (i,j) region of the image contains a boundary, then the (i,j) entry of one of the matrices will contain a 1. As a result, the boundary matrices contain location information about the boundaries in the image. A second function is then called that extracts that location information from the boundary matrices, which produces a collection of coordinates that represent the locations of the boundaries in the underlying image.

In summary, we run a boundary detection algorithm, and then extract the coordinates of those boundaries as a dataset of points, which provides shape information about the underlying image. We can visualize the boundary coordinates by simply removing the image, leaving only the boundaries, which is shown below.

Screen Shot 2019-07-15 at 7.22.13 PM

The MNIST dataset images are too small to be analyzed by this algorithm, so instead, I’ll present two datasets I’ve used previously: a dataset of 40 photos of handwritten A’s and B’s (20 of each), and a dataset of 40 photos of speakers and headphones (20 of each), each taken with an iPhone 6. As is evident, each dataset consists of two classes of images.

The classification is accomplished by first running the boundary detection algorithm on each image, extracting the point information from the boundaries, and then categorizing the resultant two-dimensional shapes as collections of points. In this case, I used the same categorization algorithm that I introduced in my main research paper on AI. Samples of the images are shown below, together with the boundaries detected by the algorithm.

I then test whether the categories generated are consistent with the hidden classification data. Specifically, if a category contains more than one class of image, then I treat that category as a fail. So, for example, if an image of a speaker is categorized together with four images of headphones, then that entire category of five images is treated as a fail. As a result, a category must be perfectly consistent with the hidden classification data in order to constitute a success. The accuracy is then the number of successes divided by the total number of categories. In this case, for the headphones / speakers dataset, the categorization algorithm was 100\% accurate, meaning all categories were perfectly consistent with the hidden classifiers. For the handwritten character dataset, the categorization algorithm had an accuracy of \frac{22}{23} = 95.65\%, since exactly 1 category out of 23 was inconsistent with the hidden classifiers.

All of the code necessary to run these algorithms is available on my researchgate page.

Discrete Data Categorization

Note that there’s an algorithm in my code archive that can form categorizations on datasets that consist of discrete sets, as opposed to Euclidean vectors (see, “Optimize_categories_intersection_N”).

This algorithm would be useful for categorizing data that consists of items with arbitrary labels, rather than positions. For example, datasets of consumer preferences (i.e., foods, music, movies, etc.).

I’m working on a research note explaining some theoretical aspects of the algorithm, but it’s no different than the “Optimize_categories_N” algorithm that I introduced in my main research paper. The only material distinction is that the intersection algorithm uses the intersection operator to compare items, rather than the Euclidean norm operator. That is, the “Optimize_categories_N” algorithm compares two elements by taking the norm of the difference between those two elements, whereas the intersection algorithm uses the intersection count between two sets.

I can see that the vectorized intersection image algorithm is quite popular (based upon the stats), so I thought I’d call attention to this algorithm as well, since it uses the exact same operator I discussed in the vectorized image partition article.

Chamber Music

I’ve just completed two original chamber music pieces, and though I don’t ordinarily post about my compositions, I’m extremely proud of this work, which you can find on my soundcloud page:

Fast Boundary Detection Algorithm

I’ve developed an extremely fast boundary detection algorithm, the code for which you can find on my researchgate page.

It’s based upon the same “big delta” concept I introduced in previous research notes here and here.

The algorithm operates in just a few seconds (often 2 to 3 seconds), and produces awesome results. I’ll follow up with a more formal research note explaining how it works in the next few days.

Here are a few examples:

A Computational Model of Time-Dilation

I just corrected an equation that has been, unfortunately, incorrect while I was abroad in Sweden, without access to my main computer with all the files necessary to update the paper.

With this correction, I believe I’ve presented the foundations of an entirely new model of physics that is consistent with the Special and General Theories of Relativity, that nonetheless makes use of objective time.

Researchgate:

A Computational Model of Time-Dilation

Pdf:

A Computational Model of Time-Dilation

Vectorized Image Partitioning

In this article, I’m going to present a low-degree polynomial runtime image partition algorithm that can quickly and reliably partition an image into objectively distinct regions, using only the original image as input, without any training dataset or other exogenous information. All of the code necessary to run the algorithm is available on my code bin.

There’s also a simple script “Consolidate Partition – CMND LINE” that consolidates these features into larger objects.

I’ll follow up with a “crawler” that will assign contiguous regions unique labels to make feature extraction more convenient.

Article: Vectorized Image Partitioning
 

Vectorized Image Boundary Detection

In a previous note, I introduced an algorithm that can, without any dataset or other prior information, reassemble a scrambled image to its original state. In this note, I’m going to introduce a related algorithm that can quickly find boundaries in an image without any dataset.

Though I’ve yet to formally analyze the run time of the boundary detection algorithm, it is remarkably fast, and certainly a low-degree polynomial. I suspect this algorithm will have applications in real-time object tracking, lane detection, and probably machine generated art.

The core insight of both algorithms is that real life objects are generally structured in a manner that causes colors to be locally consistent. That is, if we look closely at an object, we’ll find that color palettes that are proximate in space are generally similar. Therefore, as we traverse an image in a given direction, if we find that the color palette changes sharply, then we’ve probably transitioned to a new region of the image.

Both algorithms measure local color consistency using a vectorized implementation of set intersection that I’ve developed. Specifically, both test how many colors two regions have in common by taking the intersection of the color sets in the two regions. However, unlike ordinary intersection, these algorithms measure what I call the “\delta-intersection” of two sets, which, rather than test for equality, tests whether the norm of the difference between two color vectors is less than some value \delta. If the norm of the difference is less than \delta, then the two colors are treated as “the same” for purposes of calculating the intersection between the two sets that contain the colors in question.

This value of \delta is optimized using the same methods that I introduced previously, which make use of information theory, producing a context-dependent level of distinction that allows us to say whether any two given colors are close enough to be considered the same. You can find an overview of my work in artificial intelligence in my research paper, A New Model of Artificial Intelligence.

How it works

This is just a high level summary, as I’m leaving Sweden tomorrow for New York City, and plan to write a comprehensive, formal research paper on all of my work on color processing and color perception, which is at this point, extensive.

The first step is to break up the image into rectangular regions using my core image partition algorithm. This will impose a grid upon the image that already contains the macroscopic boundaries of the objects within the image, but because it’s structured as a grid, it also contains edges that don’t correspond to actual boundaries. The example below shows a photograph I took in Williamsburg, Brooklyn, of a small brick wall, together with the image as broken into regions by the partition algorithm. The image on the right shows the average color of each region generated by the partition algorithm, as a visual aid.

 

The next step is to go through each of these regions, first by “reading” each row of the partition from left to right, and calculating the \delta-intersection of the colors contained in neighboring regions. So beginning with the top left region of the image, which is mostly white, we calculate the \delta-intersection between that region and the region to its immediate right, which is also mostly white. As we transition from left to right, we’re going to eventually reach regions that have a darker tone, causing the intersection count to start to drop off. As you can see, regions (1,3) and (1,4) have significantly different coloring, which is going to cause the \delta-intersection count to suddenly drop off, suggesting that there’s a boundary, which is actually the case.

All of my work so far in artificial intelligence has made use of finite differences to construct categories, and more generally, distinguish between objects. For example, the value of \delta above that we use to distinguish between colors is a fixed value that is intended to be used as a limit on the difference between two vectors, above which, we distinguish. I’ve coined the term minimum sufficient difference to describe this concept of \delta generally. That is, in this case, \delta is the minimum sufficient difference between two color vectors necessary to distinguish between the colors. In simple terms, if the difference between two colors is more than \delta, then they’re not the same in the context of the image, and their difference exceeds the minimum sufficient difference for distinction in that context.

However, when “reading” an image from left to right, there might not be a single minimum sufficient difference between intersection counts capable of identifying all of the boundaries in an image. As a simple example, consider the following sequence of integers:

1, 2, 5, 107, 210, 250.

Let’s pick a fixed value of \delta = 6. Reading this sequence from left to right, and calculating the difference between adjacent entries, we would place delimiters as follows:

1, 2, 5 || 107 || 210 || 250.

If these numbers represent the intersection counts between neighboring regions in an image, then this partition is probably wrong, since the numbers 107, 210, and 250, probably all correspond to a single, new region that begins at 107. That is, the correct partition is probably the following:

1, 2, 5 || 107, 210, 250.

This partition cannot be produced using a fixed finite difference. Specifically, since 5 and 107 are categorized separately, it must be the case that 107 - 5 = 102 > \delta. Because 107 and 210 are categorized together, it must be the case that 210 - 107 = 103 < \delta. But obviously, it cannot be the case that \delta < 102 and \delta > 103. Nonetheless, we might need to produce this partition, so as a result, the boundary algorithm makes use of a ratio test, rather than a finite difference test. Specifically, it tests the ratio between the intersection counts of neighboring regions.

Continuing with the sequence of integers above, we would calculate the ratio between 1 and 2 (.5), 2 and 5 (.4), 5 and 107 (.0467), 107 and 210 (.5095), and 210 and 250 (.84). Using this approach, we can fix a minimum ratio of \Delta = .4, which will cause us to draw a boundary at the right place, between 5 and 107.

Applying this approach to the original image of the wall generates the following partition:

 

I’ve come up with a method of optimizing \Delta that you can see in the code, which is available in my code bin (see, “delimit_image”). I’ll follow up with a feature extraction algorithm based upon this technique, that will identify contiguous regions in an image.

I’ll explain all of this in greater detail from the other side of the Atlantic, in New York City. In the interim, here are three more examples of boundaries generated by the algorithm:

 

 

 

 

Recovering a Distorted Image With No Prior Information

In this note, I’m going to present an algorithm that can, without any prior information, take a scrambled image, and reassemble it to its original state. In short, it shows that even if we know nothing about a distorted image, we can still figure out what the image was supposed to look like using information theory. The only input to the algorithm is data derived from the scrambled image itself, and the algorithm isn’t recognizing objects. It is instead, rearranging the scrambled image in a manner intended to maximize the expectation that any objects in the image will be reassembled.

Human beings can solve these types of puzzles easily because we have experience with real world objects, and know, whether consciously or not, what things should look like, generally. Because this algorithm operates with no prior information at all, it suggests that our intuitions for structure might have an even more primordial origin that goes beyond experience, and is instead rooted in how objects in the natural world are generally organized.

This algorithm is yet another example of the utility of my approach to artificial intelligence, which is to make theoretical assumptions based upon information theory about what should happen given a particular input, and proceed mechanistically without a dataset, with the expectation that what is generated should be useful and accurate. Interestingly, this particular algorithm can be fairly characterized as a matrix algebra algorithm, since it swaps entries in a matrix according to a simple closed form formula. As a result, this algorithm has more in common with Gaussian Elimination than a typical machine learning algorithm.

In this note, I’m going to apply this algorithm only to images, but I suspect it has more general applications in information recovery. In particular, I think this algorithm could be used to reassemble not only images, but 3D objects, and though not an area of my personal interest, it seems like it could also have applications in code-breaking and DNA analysis.

Partitioning an Image

Previously, I’ve discussed my image partition algorithm, which makes use of assumptions based in information theory in order to partition an image into objectively distinct features, without any prior information (i.e., without a dataset). You can read about this algorithm, and others, in my working paper, “A New Model of Artificial Intelligence“.

In short, the image partition algorithm breaks an image into regions that have maximally distinct amounts of color information, as measured by the entropy of the color distribution in each region. The result is a very fast edge-detection algorithm that can then be used to extract shape information, and color information, which can in turn facilitate object recognition and image classification. For more on this, see my working paper, “A New Model of Artificial Intelligence: Application to Data II“.

At a high level, the image partition algorithm takes a random image and asks, even though I know nothing about this image, where do I expect to find the edges of the objects in the image? This first example above is a picture of a bird that I found online, and on the right hand side, you can see how the partition algorithm extracts the shape and edge information from the original image. The image on the right consists of all points in the original image that the algorithm believed to be part of the foreground of the original image.

The second example is a photograph I took in Williamsburg, Brooklyn, of a small brick wall that a street artist painted red, and wrote a message on. The image on the left is the original image, and the image on the right is the image after being processed by the partition algorithm, which also produces a series of weights, based upon whether the algorithm thinks the region in question is a foreground feature. In this case, I’ve applied a “macro” version of the partition algorithm that searches for macroscopic objects, such as the wall. The darkness of a region indicates the probability of the region being a foreground feature, with darker regions less likely to be part of the foreground of the image. These weights are not relevant for purposes of this note, but they help to distinguish between the regions in the image identified by the partition algorithm, and demonstrate the algorithm’s ability to locate macroscopic boundaries of objects.

The algorithm I’ll present in this note takes a random image and asks, even though I know nothing about the original state of the image, what was it supposed to look like?

Measuring Local Consistency

Standard Deviation and Average Color

The fundamental observation that underlies the image reassembly algorithm is that when an image is in its original state, colors are generally locally consistent. I’ve deliberately selected this example since it features a bright red wall, green grass, and a blue sky, with each more or less sequestered from the other, which exaggerates this general principle.

If we permute the regions in this image, we’re going to create greater local variation in color, but we’re not going to affect the standard deviation of the colors. This might seem like a trivial observation, but the implication is that permuting the regions adds disorder that is not measured by the standard deviation of the colors.

Having made this observation, I set out to measure what it is that’s changing as we permute the regions in an image. What I’ve come up with is a tractable, and useful measure of local color consistency that also serves as the measure that ultimately allows the reassembly algorithm to function. At a high level, the algorithm swaps regions in the scrambled image, and tests whether the local color consistency score increased or decreased as a result of the swap, and proceeds mechanistically with the goal of maximizing local color consistency. The images above show the average color of each region in the original image, the image after being scrambled 25 times, and a graph showing the total color consistency score as a function of the number of scrambles (i.e., starting with the original image and randomly swapping regions).

Color, Distinction, and Complexity

As noted, at a high level, the reassembly algorithm operates by maximizing the consistency between the colors contained within two regions. This requires us to measure the similarity between sets of colors, which in turn requires that we measure the similarity between individual colors.

Because colors are typically represented as RGB vectors, it’s both natural and convenient to measure the difference between two colors using the norm of the difference between their corresponding color vectors. However, for purposes of the reassembly algorithm, we’ll also have to be able to say whether we should distinguish between two colors. That is, we’ll have to develop a binary test that determines whether or not two colors are similar enough to constitute a “match”.

This question cannot be answered by looking only to the norm of the difference between two color vectors, since the norm operator will produce only a measure of difference. That is, on its own, this difference cannot tell us whether we should distinguish between the two colors. This is essentially the same question that I answered in the development of all of my algorithms, which is, given the context, what is the minimum sufficient difference between two objects that justifies distinguishing between them?

Distinction is what drives complexity, both as a practical, and theoretical matter. That is, the more granular our distinctions are, the more complexity any observed object will have, and the less we distinguish, the less complexity any observed object will have.  Though the actual algorithmic complexity (i.e., Kolmogorov Complexity) of an object is not computable, we can get close enough using the Shannon entropy, which is a built-in function in both Matlab and Octave.

As a result, we can find a useful answer to the question of what minimum difference justifies distinguishing between two objects in a dataset by iterating from \delta = 0 up to the standard deviation of the dataset, and measuring the rate of change in the entropy of the structure generated by each value of \delta (i.e., each level of distinction). For example, if \delta = 0, then any difference between two objects will cause us to distinguish between them. Specifically, if \delta = 0, and the norm of the difference between two color vectors is non-zero, then we will distinguish between those two colors.

The core observation that underlies nearly all of my work in artificial intelligence is that the value of \delta that generates the greatest change in the measured structure of the object is the value of \delta at which the greatest change in the structure of the object became apparent. That is, as we iterate through values of \delta, there will be some value that maximizes the rate of change in the entropy of the object, which is the point at which the measured structure of the object changed the most.

Imagine adjusting the focal point of a camera lens – at all focal points before and after the correct focal point, the image is simply blurry, suggesting that the rate of change in structure is probably roughly the same on either side, until you approach the actual focal point, when suddenly, a drastic change in perceived structure occurs. Though this is not a mathematically precise analogy, it is useful for establishing an intuition for why this approach works as a general matter.

The “Delta-Intersection” of Sets

The algorithm measures the similarity between two regions in an image by counting the intersection of the sets of colors contained in the two regions. Ordinarily, intersection is measured by counting the number of common elements contained in two sets. In this case, we treat two colors as “the same” for purposes of calculating the intersection, so long as the norm of the difference between their respective color vectors is less than \delta. That is, we test whether the norm of the difference is less than \delta, rather than testing for equality, which is the ordinary way of calculating intersection. This is done to address the practical reality that because RGB encoding allows for an enormous number of colors, we’re unlikely to find a substantial intersection between two regions if we test for equality between colors.

The object whose complexity we’re measuring is in this case a matrix that contains the intersection count between a given region and its four neighbors (up, down, left, and right). That is, each region in the image can be associated with a row and a column index (i,j), and this matrix contains the total number of colors that the (i,j)-region has in common each of its neighbors, as measured by the \delta-intersection operator. That is, the (i,j) entry of the matrix contains the sum of the \delta-intersections between the (i,j) region and each of its four neighbors (i.e., we ignore the diagonals).

Note that as we ratchet up the value of \delta, more colors will qualify as the same, increasing the intersection count between all regions. The algorithm will find the optimum value of \delta that maximizes the rate of change in the entropy of the matrix.

In calculating the entropy of the matrix, we treat the intersection count in each entry as a frequency, divide by the total intersection count across all entries, and then treat the resulting set of numbers as a probability distribution. This means that as the intersection count becomes more uniform across the image, the entropy of the matrix will increase. In turn, this means that images that are uniformly color-consistent will have a high entropy, whereas images that have pockets of highly consistent regions, and other pockets of less consistent regions, will have lower entropy. This is similar to how entropy is used generally, as a measure of dispersion, but in this case, the quantity whose dispersion we are measuring is the intersection count as a portion of the total intersection count over the entire matrix.

The results of the algorithm of course depend upon the complexity of the image, and as the example above shows, it does produce errors with more complex scenes, and fails to recognize some of the obvious outlier regions. Nonetheless, it clearly uncovers a significant portion of the original structure of the image, and moreover, I expect to be able to improve its performance without significantly impacting runtime by tweaking some of the underlying variables, such as the number of colors it samples in each region.

After I’m convinced that I’ve maximized its performance, I’m going to follow up with a more formal working paper that measures the performance of the algorithm, and possibly, present other applications of this algorithm outside of image reconstruction. But for now, I can say that it is a very low-degree polynomial runtime algorithm (I believe O(M^3), where M is the number of regions the image is broken into) that performs well, and can be run on cheap consumer devices.

I’ll also follow up with a new image partition algorithm that makes use of the \delta-intersection.

The relevant Matlab / Octave code can be found in the Algorithms tab (see, “unsup_image_rebuild_fast”). Subroutines can be found by typing the function name in the search field on my code bin.