This morning I realized that you can consider error in terms of uncertainty rigorously. I explained the mathematics of this in a previous article, showing that you can associate the distance between two vectors and
with an amount of information given by,
, where
is the norm of the difference between the two vectors
and
. We can therefore also associate the total error
between some prediction function
, and the correct underlying function
, over some domain, with some amount of information given by
.
Separately, I also showed that there’s a simple equation that relates information, knowledge, and uncertainty, as follows:
Intuitively, your uncertainty with respect to your prediction function is a function of your error
, for the simple reason that as error increases, your confidence in your prediction decreases. So let’s set the uncertainty in the equation above to,
.
What this implies is that when your error is zero, your uncertainty is also zero, and moreover, grows as an unbounded function of your error.
The total information of a static system should be constant, and so the value should in this case also be constant, since we are considering the relationship between two static functions,
and
. This implies that whatever the value of our knowledge
is, it must be the case that,
,
For some constant .
Because we have assumed that , the only function of
that satisfies this equation is the function,
.
What’s interesting about this is that this set of equations together implies that,
.
Moreover, for all non-zero error, your knowledge is a negative number.
At first, I was a puzzled by this, but upon reflection, it makes perfect sense, and is consistent with the scientific method generally:
If you have no error, then you know nothing;
If you have any error at all, then you know you’re wrong.
This is in contrast to the knowledge that is possible when dealing with systems that have components that can be identified, and defined with certainty, which I discuss in the articles linked to above. In this case, what you’re measuring is your absolute uncertainty given only your error with respect to some true underlying function over some domain. It is therefore simply not the case that this would allow you to make any claims about the behavior of the function outside of that domain, absent other assumptions limiting the possibilities for the underlying function. Said otherwise, at best, in the absence of additional assumptions, you know nothing, in that you’re not wrong over that domain, but you have no knowledge about outside of that domain, by virtue of your prediction function
.