UCI Credit Dataset

I’ve applied my knowledge-based prediction algorithm to the UCI Credit Dataset (which I’ve attached, already formatted), and the results are the same as always, peaking near 100% accuracy (using only 3,500 rows). What’s interesting is the use of ordinal values, in this case levels of education. Though I didn’t explicitly do it in this case, what you could do is simply use my normalization algorithm as applied to those dimensions that contain ordinal values. This will cause the normalization algorithm to use different orders of magnitude for the ordinal dimensions, thereby solving for the appropriate spacing between ordinal values. So for example, if the possible ordinal rankings are (1, 2, 3), then this application of the normalization algorithm will test, e.g., (10, 20, 30), (100, 200, 300), etc., each of which will produce different accuracies, with the best one selected.

Below is a plot of accuracy as a function of knowledge, which you can read about in the first paper linked to above:

Note that this will not work for truly arbitrary labelled data, where the values have no obvious numerical meaning (e.g., colors). In that case, you could iterate through different ordinal rankings of the labels (i.e., permutations), and then use the normalization method to solve for the best numerical values for the labels for each ordinal ranking. Nonetheless, attached is command line code that both normalizes and runs predictions on the UCI Credit Dataset. This is going to be included in some commercial release of my software, but given that it’s so powerful, I’m probably not going to release it on the Apple App Store as part of my Kickstarter Campaign (at least not this version of it).

UCI Credit Dataset.csv

Simple Normalization Script

As noted, I’m writing a MacOS version of my core A.I. software, and because Swift is not vectorized, it’s not exactly the same, which requires me to reevaluate my code. In this case, I took a look at my normalization script, and it’s so complicated, I simply redid it in Octave, for the sole of purpose of rewriting it in Swift (see attached). The idea is the same:

You cluster the dimensions by number of digits, and then iteratively reduce the largest dimensions (i.e., the cluster that contains the largest dimension) by dividing by powers of ten, and test the accuracy at each scale. This is just an easier way of expressing the same idea.

Apparent Paradox Between Set Theory and the Fibonacci Generating Function

I was reminded of a result I first saw in Mathematical Problems and Proofs, that shows that the generating function for the Fibonacci Sequence implies that the sum over all Fibonacci numbers is -1.

There’s a very good proof here in the first response to the question:

https://math.stackexchange.com/questions/338740/the-generating-function-for-the-fibonacci-numbers

Simply set z = 1, and you have the sum in question, which is plainly -1.

Of course, the sum diverges, which initially lead me to view this result as a mere curiosity, though it just dawned on me, that there’s an additional problem, that I think is tantamount to a paradox:

Let S be the set produced by the union of disjoint sets of sizes F_1, F_2, \ldots, where F_i is the i-th Fibonacci number. It must be the case that S has a cardinality of \aleph_0. More troubling, addition between integers can be put into a one-to-one correspondence between unions over disjoint sets, and thus, we have an apparent paradox.

To be clear, this has nothing to do with convergence –

It should in fact sum to infinity, and it does not, whereas a perfectly corresponding union over sets does.

This example implies that the rules of algebra fail in some cases given an infinite number of terms, whereas set theory does not. One initial observation, addition is plainly not commutative with an infinite number of terms. For example, consider an alternating sum of (+1,-1,+1,-1,...). If you sum from left to right, you can cause the sum to oscillate near any finite value, or to diverge to positive or negative infinity, without changing the terms at all, simply changing their order. So as a consequence, it must be the case that the rules of algebra require reconsideration in the context of an infinite number of terms. I’m not suggesting that this is what’s driving the apparent paradox above, but rather pointing to the general issue that mechanical application of the rules of algebra to an infinite number of terms is not appropriate, and this example plainly demonstrates that fact.

Vectorized Correlation

Attached is some code that makes use of a measure of correlation I mentioned in my first real paper on A.I. (see the definition of “symm”) that I’ve finally gotten around to coding as a standalone measure.

The code is annotated to explain how it works, but the basic idea is that sorting reveals information about the correlation between two vectors of numbers. For example, imagine you have a set of numbers from 1 to 100, listed in ascending order, in vector x, and the numbers -1 to -100, in vector y, listed in descending order. This would produce the following plot in the (x,y) plane:

Now sort each set of numbers in ascending order, and save the resultant mappings of ordinals. For example, in the case of vector x, the list is already sorted in ascending order, so the ordinals don’t change. In contrast, in the case of vector y, the list is sorted in descending order, so ordinal 1 gets mapped to the last spot, ordinal 2 gets mapped to the second to last spot, and so on. This will produce another pair of vectors that represent the mappings generated by the sorting function, which for vector x will be s_x = (1,2, \ldots, ... N), and for vector y will be s_y = (N, N-1, \ldots, ... 1), where N is the number of items in each vector. Therefore, by taking the difference between the corresponding ordinals in s_x and s_y, we can arrive at a measure of correlation, since it tells us to what extent the values in x and y share the same ordinal relationships, which is more or less what correlation attempts to measure. This can be easily mapped to the traditional [-1,1] scale, and the results are exactly what intuition suggests, which is that the example above constitutes perfect negative correlation, an increasing line constitutes perfect positive correlation, and adding noise, or changing the shape, diminishes correlation.

Because I’ve abstracted sorting using information theory, you could I suppose measure the correlation between any two ordered sets of mathematical objects.

Also attached is another script that uses basically the same method to measure correlation between numerical data and ordinal data. The specific example attached allows you to measure which dimensions in a dataset (numerical) are most relevant to driving the value of the classifier (ordinal).

Black Tree AutoML – MacOS

I’ve completed an initial free version of my software for MacOS, and it’s a stand-alone executable you can download below, together with a few example datasets. Here’s a video that shows you how to use the software, but it’s really straightforward:

Download Executable (click on the ellipsis on the far right of the screen, and select “Download”).

Download Example Datasets (Courtesy of UCI)

Swift Source Code