Attached is code that I wrote in Octave that can, without any training data or other prior information, identify features within an image. It is a completely deterministic algorithm that does not make use of any statistical, or machine learning techniques.
The theoretical basis of the algorithm is that objectively distinct features of an image should contain different amounts of information. The algorithm measures information content using the Shannon Entropy of the color distribution of the image. Specifically, it breaks an image into a series of sub-images, and then measures the Shannon Entropy of those sub-images.
It begins by measuring the entropy of the image as a whole, and then divides the image into equally sized regions of 4, 9, 16, … N^2 sub-images. It then calculates the entropy of each region, for each value of N. It then selects the value of N that maximizes the standard deviation of the entropies of the regions.
The theoretical significance of this maximization is that the image is, as a result, broken into sub-images that contain maximally different amounts of information, which we view as a proxy for regions that contain objectively distinct physical features.
Using the optimal value of N generated as described above, we then weight the pixel values of each of the resulting N^2 regions by,
S = (Hu – HR)^2/s^2
where Hu is the average entropy over all regions of the image, HR is the entropy of the region in question, and s is the standard deviation of the entropies over all regions within the image. That is, if a region contains an amount of information that is approximately equal to the average amount of information contained across all regions of the image, it will be darkened significantly; whereas if a region contains an amount of information that is significantly above or below average, it will be brightened significantly.
The net result is (1) “unremarkable” regions are significantly darker and “notable” regions are significantly brighter, and (2) the image is “quantized” into regions.
I have attached two examples of images that were processed using this algorithm. One is of myself on a boat, in which the algorithm clearly recognizes my eyes, mouth, ears, and hand (and my beer). The other is of handwritten text (which was not subject to copyright), and it clearly recognizes the portion of the page with text, and the pen. As noted above, the algorithm is not trained to notice anything at all, and is entirely deterministic.
It turns out that the algorithm is actually quite good at pulling features from all types of images, regardless of subject matter.
I have also attached a “fast” algorithm that searches for an optimal value of N, rather than simply iterating through all N (attached as “maximize_std_dev”, an outline of which is also attached as a handwritten note that is easier to follow).
The functions should be called from the Octave command line as follows:
I = imread(” [ Image file path ] “);
[N Y X] = maximize_std_dev(I);
Z = ent_contrast(I,N,X,Y);
Z will contain the entropy-adjusted image. The original image must be in an RGB format.
I’ve also attached two scripts that compare two images using this approach, and return a score that measures how similar they are (Delta), after their features have been extracted. The comparison algorithm attached is set up to compare two individual images, but is really intended to compare one image to two databases of images.
For example, in the spirit of Google’s “bird vs bike” contest, this algorithm is intended to take an input image, and then compare it to a database of bird images, and a database of bike images. Which ever database generates the greatest average score per image within the database is the “most similar” to the input image.
The algorithm works by comparing the average color and the standard deviation of the color of each region of both images generated using the algorithm above. That measure for each region is then weighted according to the product of (X) the information contained in that region, as measured using its entropy, and (Y) the “importance” of the region, as measured by S above. The theory behind the algorithm, is that the more information a region contains, the more “points” the algorithm should allocate for each “unit” of similarity; and the more “important” a region is, as measured by S, the more “points” the region should get for each “unit” of similarity.
Interestingly, because the algorithm does not make use of any “real” shape information, it is invariant under rotation. Nonetheless, it seems to work quite well.
The functions should be called from the command line as follows, after the images I1 and I2 have been processed using the algorithm described above:
[A1 B1 C1 D1] = score_image(I1,N1,X1,Y1);
[A2 B2 C2 D2] = score_image(I2,N2,X2,Y2);
temp = size(I1);
num_pix_1 = floor(temp(1)/N1)*floor(temp(2)/N1);
temp = size(I2);
num_pix_2 = floor(temp(1)/N2)*floor(temp(2)/N2);
Delta = weighted_img_compare(A1,B1,C1,D1,N1,Y1,num_pix_1,A2,B2,C2,D2,N2,Y2,num_pix_2)
Finally, note that the Octave “image” package is required to run this algorithm.