# Supervised Prediction

There’s some code floating around in my library (as of 6/21/21) that I never bothered to write about that generates a value of delta for each row in the dataset, independently, effectively implementing the ideas I go through in this paper on dataset consistency. What this means is, as a practical matter, you know how far you can go from a given point in the dataset, before you encounter your first inconsistent classification. For example, if $x_i$ is the vector for row i of the dataset, then the algorithm finds the distance $\delta_i$, such that any sphere with a radius of $\bar{\delta} > \delta_i$, will contain a vector that has a class that is different from the class of $x_i$. Obviously, you can use this to do supervised prediction, by simply using the nearest neighbor algorithm, and rejecting any predictions that match to row i, but are further away from $x_i$ than $\delta_i$.

This is exactly what I did in my first set of A.I. algorithms, and it really improves accuracy. Specifically, using just 5,000 training rows from the MNIST Numerical dataset, this method achieves an accuracy of 99.971%, and it takes about 4 minutes to train. The downside, is that you reject a lot of predictions, but by definition, the rejected rows from the testing dataset are inconsistent with the training dataset. What this means as a practical matter, is that you need more data to fill the gaps in the training dataset, but the algorithm allows you to hit really high accuracies with not much data, and that’s the point of the algorithm. In this case, 30.080% of the testing rows were rejected. But the bottom line is, this obviously catches predictions that would have otherwise been errors.