Assertion and Exclusion

It dawned on me last night, that it’s difficult to do worse than chance, when predicting a random variable. Specifically, if the lowest probability event in a distribution is e.g., \frac{1}{3}, it’s not clear how you can achieve an accuracy that is lower than \frac{1}{3}, but I’m not an expert on probability, so there could be a known result on point. I noticed this because one of my algorithms routinely performs worse than chance, and I never cared, because it uses confidence filtering to produce accuracy that is on par with other machine learning algorithms. See Information, Knowledge, and Uncertainty [1], generally. But it’s actually astonishing upon reflection. For example, try to underperform 50% accuracy using a fair coin. I don’t think it’s possible over a long run, though again there could be a result on point. In fact, if you can underperform chance, it suggests that you have information about the system. For example, if I know the next outcome will be heads, I could deliberately select tails. This will eventually cause my accuracy to be zero.

The net point being, that if your accuracy is under 50%, you actually know that your answer is probably wrong, which is actually positive information regarding whatever system you’re attempting to predict. Specifically, if your accuracy is less than 50%, then it’s more likely that the correct answer is in the compliment of your prediction (i.e., the set that contains every outcome other than your prediction). This sounds trivial, but it’s not, because it’s a different kind of information. When accuracy is above 50%, the most likely answer is a set with one element, asserting a possibility. When accuracy is below 50%, the most likely answer is the compliment of a set, excluding a possibility. In the extreme case, when your accuracy is significantly below chance, this allows you to exclude a possibility with high confidence, which is often useful. That said, it’s not clear how this could happen, but again, the algorithm I describe in [1], does exactly that, routinely.

One possibility, is that there really is a fundamental principal that accuracy increases as a function of the information upon which a prediction is based, and if you can reduce the information, or increase the information, you can achieve arbitrarily low or high accuracy, regardless of what you’re trying to predict. This is consistent with another astonishing observation I made, that you can actually predict random variables using the techniques I outline in [1], with high accuracy, which should otherwise be impossible. This does not contradict the laws of probability at all, since it’s only a subset of the output of a random source that can be predicted, but it’s nonetheless a counterintuitive, and potentially profound result.