Vectorized Image Classification

In this post, I’m going to introduce a set of algorithms that can quickly classify images. Running on an iMac, it took just 1.363 seconds per image (on average) from start to finish (including all pre-processing time) to classify the dataset of images below. Note that this is the amount of time it took per image to train the classification algorithm, as opposed to the amount of time it would have taken to classify a new image after the algorithm had already been trained. Like all of my algorithms, these algorithms have a low-degree polynomial runtime.

Vectorized Boundary Detection

In a previous paper, I introduced a set of algorithms that detect the boundaries in an image by first partitioning an image into rectangular regions, and then scanning those regions from left to right, and top to bottom, and marking the locations of any likely boundaries based upon changes in color. How significant of a change in color is necessary to mark a boundary is discussed in that paper.

At a high level, the only difference between the algorithm I’ll present in this note (the code for which I’ve already posted here) and the algorithm I previously presented, is that this algorithm relies only upon the average color of each region, whereas the previous algorithm sampled about 100 colors from each region. As a result, this algorithm makes use of far less information, and is therefore much faster. However, because it is much faster, it can make use of a more fine-grained partition, which actually increases accuracy when compared to the previous version. The images below show a photograph of a wall that I took in Williamsburg, Brooklyn, together with the boundaries detected by the algorithm, which are drawn in black.

Image Classification

The first step of the classification process is to run the boundary detection algorithm on an image. This will produce two binary matrices: one representing the left to right boundaries, and another representing the top to bottom boundaries. If the (i,j) region of the image contains a boundary, then the (i,j) entry of one of the matrices will contain a 1. As a result, the boundary matrices contain location information about the boundaries in the image. A second function is then called that extracts that location information from the boundary matrices, which produces a collection of coordinates that represent the locations of the boundaries in the underlying image.

In summary, we run a boundary detection algorithm, and then extract the coordinates of those boundaries as a dataset of points, which provides shape information about the underlying image. We can visualize the boundary coordinates by simply removing the image, leaving only the boundaries, which is shown below.

Screen Shot 2019-07-15 at 7.22.13 PM

The MNIST dataset images are too small to be analyzed by this algorithm, so instead, I’ll present two datasets I’ve used previously: a dataset of 40 photos of handwritten A’s and B’s (20 of each), and a dataset of 40 photos of speakers and headphones (20 of each), each taken with an iPhone 6. As is evident, each dataset consists of two classes of images.

The classification is accomplished by first running the boundary detection algorithm on each image, extracting the point information from the boundaries, and then categorizing the resultant two-dimensional shapes as collections of points. In this case, I used the same categorization algorithm that I introduced in my main research paper on AI. Samples of the images are shown below, together with the boundaries detected by the algorithm.

I then test whether the categories generated are consistent with the hidden classification data. Specifically, if a category contains more than one class of image, then I treat that category as a fail. So, for example, if an image of a speaker is categorized together with four images of headphones, then that entire category of five images is treated as a fail. As a result, a category must be perfectly consistent with the hidden classification data in order to constitute a success. The accuracy is then the number of successes divided by the total number of categories. In this case, for the headphones / speakers dataset, the categorization algorithm was 100\% accurate, meaning all categories were perfectly consistent with the hidden classifiers. For the handwritten character dataset, the categorization algorithm had an accuracy of \frac{22}{23} = 95.65\%, since exactly 1 category out of 23 was inconsistent with the hidden classifiers.

All of the code necessary to run these algorithms is available on my researchgate page.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s