Massive Unsupervised Modal Classification

This is a sort-based version of the algorithm I discuss in Information, Knowledge, and Uncertainty, that uses the modal class of a cluster to predict the class of its geometric origin. I’m still testing it, but the accuracy seems excellent so far. It’s the exact same technique as the other sort-based algorithms, that uses sorting as a substitute for Nearest Neighbor. I proved that sorting has a deep connection to the Nearest Neighbor method in Sorting, Information, and Recursion, which forms the theoretical basis for these algorithms. The accuracies and runtimes shown below are taken on average, over 100 iterations. The testing percentage is set to 15% for all datasets (i.e., 100 rows produces 85 training rows, and 15 testing rows). Accuracy generally increases as a function of confidence, and there are two measures, one information-based, using the equations I presented in the paper Information, Knowledge, and Uncertainty, and the other size-based, which simply treats the cluster size itself as a measure of confidence (i.e., the measure of confidence is literally given by the cluster size, causing larger clusters to be treated as more reliable than smaller clusters).

DatasetRaw AccuracyMax Accuracy (Information-Based Conf.)Max Accuracy (Size-Based Conf.)No. RowsRuntime (Seconds)
UCI Abalone53.88%100.0%80.72%41770.460
UCI Credit72.12%100.0%79.69%25000.081
UCI Ion98.21%100.0%100.0%3510.016
UCI Iris93.66%100.0%100.0%1500.009
UCI Parkinsons97.78%100.0%100.0%1950.006
UCI Spam80.18%100.0%100.0%46010.370
UCI Wine100%100.0%100.0%1780.004

Here’s the code, any missing functions can be found in my library on ResearchGate.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s