Rank Correlation and Randomness

I came up with a measure of rank correlation when I was living in Sweden, that I abandoned because I was working on A.I. and physics, and it turns out, I could use it, in order to calculate the correlation between confidence and accuracy. A user on Reddit pointed out to me that something similar is already known, named the Kendall Tau Rank Correlation. The difference between what I still believe to be an original measure of correlation, and the Kendall Tau measure, is that Kendall’s measure counts the number of unequal ordinals. In contrast, my measure takes the difference between corresponding ordinals, treating them as cardinal values, for this purpose (i.e., it’s not just that 3 is unequal to 5, it’s that the absolute difference is 2). As a result, if the sorting process moves an entry from the front of the list to back of the list, that’s a measurably bigger change, than moving that entry by one index. Putting aside the question of whether or not my measure is original, something truly strange has again appeared on my radar, and I ignored it the first time because I had too much to do:

Applying my measure, a list in random order, is closer to backwards, than forwards, as measured by rank correlation.

To unpack this, when you calculate rank correlation, what you’re doing is, taking two lists of numbers, sorting one, and using the resultant permutation of ordinals, to sort the second list of numbers. So for example, given x = (1,2,3,4,5) (i.e., the domain) and y = (1, 5, 3, 2, 6) (i.e., the range), sorting the range will produce the permutation-vector (1,4, 3, 2, 5), in that the position of the first entry of y is unchanged in the sorted list of y, the position of the second entry of y has moved to the fourth entry of the sorted list of y, and so on. My understanding is that Kendall Tau counts the number of unequal ordinals, so we would in this case compare, entry by entry, the vectors (1,2,3,4,5) == (1,4, 3, 2, 5). In that case, the second and fourth entries are unequal, with two entries contributing to our measure of correlation. Note that if two vectors are perfectly correlated, then the ordinals will not change, and so we would count a score of zero, all scores over zero therefore indicating a less than perfect correlation.

When calculating my presumably original measure of correlation, you would instead treat the ordinal values as cardinals, and so rather than count the number of discordant pairs, you would take the absolute value of their differences. So rather than comparing corresponding entries for equality, you would instead take the absolute value of the difference between corresponding vectors |(1,2,3,4,5) - (1,4, 3, 2, 5)| = (0, 2, 0, 2,0), and then take the sum, producing 4. As a consequence, position matters, and moving an entry, e.g., from the front of a list to the back is more significant, than a slight change in position.

So a backwards list would cause us to take the sum over the difference between the vectors C = \sum |(1,2,3, \ldots, N) - (N,N-1,N-2, \ldots, 1)|, which is given by Gauss’ formula.

 That’s not the interesting part –

The interesting part is, let M be the value of C, given a number of N items in reverse order (i.e., M is the value of C, given N elements, in a backward list). Now posit a random list, i.e., generate a randomly permuted list using a random number generator, and take the ratio of the value C, given that list, divided by M. Or, you can also take a linear function and incrementally scramble it, you find in both cases:

The ratio \frac{M}{C} seems to converge to \frac{2}{3}, as a function of N.

This suggests a random list is closer to backwards than sorted, using this metric.

There could be some basic math I’m missing, but this is interesting.

Specifically, both a perfectly sorted list and a perfectly backwards list have trivial Kolmogorov Complexities, and so, you would expect, intuitively, a random list to fall equally between the two, but this is just wrong, as it is plainly the case that when the rank correlation of a random list, is divided by the rank correlation of a backwards list, the ratio is roughly \frac{2}{3}, suggesting a random ordering is closer to a backwards list, than a forward sorted list. Again, both a forward sorted list and a backward sorted list have a trivial complexity:


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s