Confidence-Based Prediction Search

I’ve written the Octave code for what I’m calling a “Magic Button”, that allows you to search through predictions, and find a subset of the dataset that generates the desired accuracy. This is possible because of a confidence-based prediction algorithm, which assigns each prediction a confidence score, using a method I developed rooted in information theory. As you increase the confidence threshold, the number of predictions that satisfy the confidence threshold generally decreases, and the accuracy generally increases. You can read about the method in my paper, “Information, Knowledge, and Uncertainty“.

The “Magic Button” algorithm simply takes an argument which is an accuracy threshold, and searches through the resultant curve for all points that are within 2.5% of the argument. This allows you to say, e.g., I want 90% accuracy, and then the “Magic Button” will find all points on the accuracy curve that satisfy this criteria (within 2.5%), the tie breaker being the number of rows associated with each point on the accuracy curve (remember, the higher the accuracy threshold, the lower the number of rows that meet the threshold). In the Mac OS version, this will literally be a button (see above), with a field beside it, that allows you to say, I want a given accuracy, and then all rows that satisfy that criteria are automatically saved to a file as a CSV dataset, which in the picture above, is the “Magic Button Dataset”. This will allow you to isolate the highest performing subset of your dataset, which is part of the overall economic philosophy behind the software, which is to impose efficiency on the market for A.I., since you can then punt the problem rows to a custom model, allowing an admin, e.g., to take responsibility for not only all routine matters of machine learning, but all rows of any dataset that can be automatically solved for as a group, leaving only the balance to a data scientist. If of course you request 100% accuracy, and it’s just not possible, then you’re going to get an error, and that’s life, but as a general matter, this method allows you to achieve well over 90% accuracy, given any reasonably well-made dataset.

Attached is code that applies this method to the UCI Credit Dataset, all 30,000 rows, and above is a plot of accuracy as a function of confidence (the increasing curve), together with another curve that shows the percentage of the rows that satisfy the confidence threshold (the decreasing curve). The target accuracy for the Magic Button is set to 92.5%, and returns an actual accuracy of 93.103%, with only 29 rows surviving at that accuracy. Note that this is distinct from the software that initially tested the method, in that this software runs exactly once (because it has to as a commercial product), producing a jagged curve, whereas the previous software generated a curve on average, over hundreds of randomly generated training / testing dataset subsets, producing smoother curves (since it was testing a thesis). I will likely offer both in the final product, because random testing allows you to produce a smooth curve of accuracy as a function confidence, which you can then use when you don’t know the classifiers of the testing dataset, which is probably what’s going to happen in the real world. Specifically, you can say, e.g., this testing row has a confidence score of X, and on average, rows with a confidence score of X, have an accuracy of Y.

As a reminder, you can download Octave for free from GNU.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s