# Alternative Normalization Algorithms

Out of curiosity, I experimented with alternative normalization algorithms, and the results are basically the same as my core approach, which is to iterate through different digit scales, and run nearest neighbor, selecting the digit scale that generates the highest accuracy for nearest neighbor. The reason this works, is because you’re maximizing local consistency, by definition. The alternative approach, is to run a different algorithm, and test its accuracy, in this case, I ran a cluster prediction algorithm. Limited testing suggests it’s at best just as good, and possibly not as good, so given that nearest neighbor is incredibly fast when vectorized (i.e., O(number of rows)), there’s no practical reason to do otherwise.

You can find an example of the alternate code on Research Gate.

# Supervised Modal Prediction

I am of course in the process of writing my AutoML Software, and I decided to include a supervised version of my modal prediction algorithm (not sure if it will be in the free version).

The code is available on Research Gate.

# UCI Sonar Dataset

I was on a Meetup video conference, and someone mentioned a dataset that doesn’t cluster well (the UCI Sonar Dataset), so I naturally did a bit of work on it, and it turns out, the dataset literally contains very few clusters. Specifically, roughly 34% of the rows are contained in spherical clusters. The average cluster size over all rows is about .8 elements per cluster, again suggesting that you’re not going to get good clustering out of this dataset, because there are no real clusters to begin with, just as a matter of geometry. Nearest Neighbor nonetheless performs reasonably well, with an accuracy of 82.692%.

Code attached.

# Updated “Magic Button” Code

I’ve updated the “Magic Button” code to accommodate a training / testing structure, because I have to do it anyway at some point, when I release the Pro Version of my AutoML Software, Black Tree. It’s the same thing, the only difference is it draws the clusters from the training dataset, given the testing dataset, which keeps the wall up between the training and testing datasets.

# Thought on Measuring Continuity

When you’re given a function, as observed, you will have discontinuity, and so the question becomes, is the discontinuity the result of observation, or the result of the underlying function itself? And in each case, how can I measure that, given my observed data, which is likely all you have to work with? It just dawned on me, my paper, “Sorting, Information, and Recursion“, seems to address exactly this topic. Specifically, Equation (2) will increase as the distance between the terms in a sequence increases. So as a result, what you can do is, first test the data using Equation (2) as is, without sorting the data. So for example, given $F(1) = 1, F(2) = 2, F(3) = -1$, we would take the difference between adjacent range values as is, producing the vector $(1,-3)$. Then calculate $\bar{H}$ using that vector. Then, you sort the range values, in this case producing $(-1,1,2)$, and repeat this, and the degree to which $\bar{H}$ changes in the latter case, is a measure of how continuous your data is, because continuous data will have small gaps in the range values, and will be locally sorted in some order, as a consequence. Note that you should use the variant of Equation (2) I presented in Footnote 5, because for a continuous function, the distance between range values will likely be less than 1, and if you do that, then tighter and tighter observations will cause $\bar{H}$ to get closer and closer to $0$.

# Thought on Interpolation

So I’m sure I shared it somewhere, though I’m not going to bother to look for it, I developed a method for finding the degree of a function (as a polynomial) using differentiation, the idea being that you differentiate numerically some fixed number of times (simply taking the difference between adjacent terms). Then you find the derivative that is closest to zero, by simply taking the sum over the terms. So for example, if you have range values (3,4,5), taking the difference between adjacent terms produces the vector (1,1), and doing that again you have (0). In the real world, data will not be so nice, but you can find the vector that is closest to zero by taking the sum over the vector. Now you know the next vector up is a linear function, the one after that second degree, and so on. Once you get back up to the function, now you know the degree of the function as a polynomial, and then you can use simple Gaussian Elimination to solve for the function as a polynomial. This algorithm will be included in the pro version of my AutoML software.