A Simple Theory of Randomness

The Shannon Entropy is not a good measure of structural randomness, for the simple reason that all uniform distributions maximize the Shannon Entropy. As a consequence, e.g., a sequence of alternating heads and tails (H,T,H,T, \ldots ) maximizes the Shannon Entropy, despite having an obvious structure. The Kolmogorov Complexity solves this, since the shortest program that generates an alternating sequence of heads and tails is obviously going to be much shorter than the length of the sequence, and therefore such a sequence is not Kolmogorov-Random, which requires the complexity to be the length of the sequence plus a constant.

However, Kolmogorov Randomness is too strong of a requirement, since there are sequences that are intuitively random, that are not Kolmogorov Random. Specifically, consider e.g., a Kolmogorov Random string x. Now partition that string into substrings (x_1, \ldots, x_k). This collection of substrings must also be Kolmogorov Random, for assuming otherwise implies that we can compress x, which contradicts our assumption that x is Kolmogorov Random. We can then interpret each x_i as a binary number. Now construct a new binary string y, such that y consists of x_1 1’s, in a sequence, followed by a 0, then x_2 1’s, in a sequence, followed by a 0, etc. That is, y is constructed by treating each x_i as the length of a sequence of 1’s, that is followed by a 0. We can therefore specify y by specifying each x_i, which requires A(y) = \sum_{i = 1}^k \log(x_i) + C bits. As a consequence, if A(y) < |y|, then simply specifying each sequence length will compress y. Nonetheless, sequence y is intuitively random, because the indexes of the 0’s are given by a Kolmogorov Random string. However, if the numbers encoded by each x_i are sufficiently large, then A(y) could be significantly less than |y|, and as such, y would not be Kolmogorov Random.

This suggests a simple and intuitive definition of a random string, where a string y is random if K(y) \geq A(y). That is, if the subsequences of a string cannot be compressed beyond identifying their lengths, then the string is random, even though it is not Kolmogorov Random. Note that it doesn’t matter whether we identify the 0’s or 1’s in a string, since we can simply take the complement of the string, which requires an initial program of constant length that does not depend upon the string, thereby increasing the Kolmogorov Complexity by at most another constant.

This is quite nice, because it’s simple, and it allows for sequences that have arbitrarily long periods of stability to still be treated as random. And as a consequence, if a sequence has an extremely low entropy, it could still be random in this view, if its subsequences cannot be compressed beyond identifying their lengths.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s