Black Tree AutoML Kickstarter Campaign

Dear Readers,

I’ve decided to fund my A.I. venture using Kickstarter, as it is the most attractive avenue to keep things moving, since it effectively allows me to sell discounted versions of my software upfront, and keep whatever proceeds I raise:

https://www.kickstarter.com/projects/blacktreeautoml/black-tree-automatic-machine-learning

There are four levels of funding:

1. $5

2. $25

3. $100

4. $500

Each gives one person a life-time commercial license to use increasingly sophisticated versions of my software, that I plan to charge monthly for once the software is released on Apple’s App Store.

The idea is to completely democratize access to A.I., by taking the software to market using the Apple App Store, offering it at a small fraction of what competing products charge, the idea being that a global audience that pays a small sum per month is more profitable than marketing only to enterprise clients that pay large sums of money –

The plan is to sell the McDonalds of A.I., for $5 or $50 per month, depending on what you need.

There’s a video on the Kickstarter page that demonstrates the software, but at a high-level, it is, as far as I know, the most efficient machine learning software on Earth, and incomparably more efficient than Neural Networks, accomplishing in seconds what takes a Neural Network hours. It is also fully autonomous, so you don’t need a data scientist to use it –

Anyone can use it, as it’s point and click.

This will allow small and large businesses alike to make use of cutting-edge A.I. that could help manage inventory, predict sales, predict customer behavior, etc.

I’ve also shown that it’s capable of diagnosing diseases using medical imaging, such as brain MRI’s, again with the same or better accuracy as a neural network:

https://derivativedribble.wordpress.com/2021/09/23/confidence-based-prediction/

Because it can run quickly on a laptop, this will allow for medical care to be delivered in parts of the world where it is otherwise unavailable, and at some point, I plan to offer a free version for exactly this purpose.

Your support is appreciated, and it’s really going to make a difference.

Best,

Charles

Matrix / A.I. Library for Mac OS Swift

This is still a work in progress, but it contains a lot of functions that I found to be missing from Swift for doing machine learning, and using matrices in general.

Note this is of course subject to my copyright policy, and cannot be used for commercial purposes.

The actual commercial version of my core A.I. library should be ready in about a month, as this is going much faster than I thought it would.

I’ve also included a basic main file that shows you how to read a dataset, run nearest neighbor, manipulate vectors, and generate clusters. It’s the same exact algorithms you’ll find in my Octave library, the only difference being it’s written in Swift, which is not vectorized, but has the advantage of publishing to the Apple Store, and that’s the only reason I’m doing this, since I don’t really need a GUI for this audience.

Confidence-Based Prediction

Summary

My core work in deep learning makes use of a notion that I call \delta, which is the minimum distance over which distinction between two vectors x,y is justified by the context of the dataset. As a consequence, if you know \delta, then by retrieving all vectors in the dataset y for which ||x - y|| \leq \delta, you can thereby generate a cluster associated with the vector x. This has remarkable accuracy, and the runtime is O(N), where N is the number of rows in the dataset, on a parallel machine. You can read a summary of the approach and the results as applied to benchmark UCI and MNIST datasets in my article, “Vectorized Deep Learning“.

Though the runtime is already fast, often fractions of a second for a dataset with a few hundred rows, you can eliminate the training step embedded in this process, by simply estimating the value of \delta using a function of the standard deviation. Because you’re estimating the value of \delta, the cluster returned could very well contain vectors from other classes, but this is fine, because you can then use the distribution of classes in the cluster to assign a probability to resultant predictions. For example, if you query a cluster for a vector x, and get a cluster C, and the mode class (i.e., the most frequent), has a density of .5 within C, then you can reasonably say that the probability your answer is correct is .5. This turns out to not work all the time, and very well sometimes, depending upon the dataset. It must work where the distribution of classes about a point is the same in the training and testing datasets, and this follows trivially from the lemmas and corollaries I present in my article, “Analyzing Dataset Consistency“. But of course, in the real world, these assumptions might not hold perfectly, which causes performance to suffer.

Another layer of analysis you can apply that allows you to measure the confidence in a given probability makes use of my work in information theory, and in particular, the equation,

I = K + U,

where I is the total information that can be known about a system, K is your Knowledge with the respect to the system, and U is your Uncertainty with respect to the system. In this case, the equation reduces to the following:

N \log(c) = K + N H(C),

where N is the size of the prediction cluster, c is the number of classes in the dataset, K is again Knowledge, and U = H(C) is the Shannon Entropy of the prediction cluster, as a function of the distribution of classes in the cluster.

You can then require both the probability of a prediction, and your Knowledge in that probability, to exceed a given threshold in order to accept the prediction as valid.

Knowledge is in this case given by,

K = N \log(c) - N H(C).

You can read about why these equations make sense in my paper, “Information, Knowledge, and Uncertainty“.

If you do this, it works quite well, and below is a plot of accuracy as a function of a threshold for both the probability and Knowledge, with Knowledge adjusted to an [0,1] scale, given 5,000 rows of the MNIST Fashion Dataset. The effect of this is to require an increasingly high probability, and increasing confidence in your measure of that probability. The dataset is then broken into 4,250 random training and 750 random testing rows, done 150 times. The accuracy shown below is the average over each of the 500 runs, as a function of the minimum threshold, which is again, applied to both the accuracy and the confidence. The reason the prediction drops to 0 at a certain point, is because there are no rows left that satisfy the threshold. The accuracy peaks at 99.647\%, and again, this is without any training step, so it is in fairness, an unsupervised algorithm.

The results are comparable for the MRI Brain Cancer Dataset (max accuracy of 100%), Harvard Skin Cancer Dataset (max accuracy of 93.486%).

Cluster-Based Prediction

This method is already baked into some of my most recent posts, but I wanted to call attention to it in isolation, because it is interesting and useful. Specifically, my algorithms are generally rooted in a handful of lemmas and corollaries that I introduced, that prove, that the nearest neighbor method produces perfect accuracy, when classifications don’t change over small fixed distances. That is, if I’m given a row vector x from the dataset, and the set of points near x have the same classifier as x, then the nearest neighbor algorithm can be modified slightly to produce perfect accuracy. And I’ve introduced a ton of software that allows you to find that appropriate distance, which I call \delta. The reason this sort of new approach (I came up with this a while ago) is interesting, is because it doesn’t require any supervision –

It uses a fixed form formula to calculate \delta, as opposed to a training step.

This results in a prediction algorithm that has O(N) runtime, and the accuracy is consistently better than nearest neighbor. Note that you can also construct an O(\log(N)) runtime algorithm using the code attached to my paper Sorting, Information, and Recursion.

Here are the results as applied to the MNIST Fashion Dataset using 7,500 randomly selected rows, run 500 times, on 500 randomly selected training / testing datasets:

1. Nearest Neighbor:

Accuracy: 87.805% (on average)

2. Cluster-Based:

Accuracy: 93.188% (on average)

Given 500 randomly selected training / testing datasets, the cluster-based method beat the accuracy of nearest neighbor method 464 times, the nearest neighbor method beat the cluster-based method 0 times, and they were equal 36 times. The runtime from start to finish is a few seconds (for a single round of predictions), including preprocessing the images, running on an iMac.