Shannon’s equation is often used as a measure of uncertainty, and that’s not unreasonable, and he provides a mathematical argument as to why it works as a measure of uncertainty in his seminal paper on information theory.
But, I’ve introduced a different way of thinking about uncertainty rooted in partial information, that is quite elegant, since as you’re given partial information about a system, your uncertainty approaches zero as a matter of arithmetic.
However, this model doesn’t take into account the probability distribution of a system, and instead looks only to the number of states the system can be in. Your uncertainty is simply the log of the number of states of the system.
We can also take into account the distribution of the states of the system by instead calculating the expected number of states, which will be some portion of the total number of possible states.
So if a system is always stuck in some particular state, then your uncertainty is much lower than a uniform distribution of states, and so is the expected number of unique states over any number of observations. Calculating the expected number of states is trivial, and you simply calculate the portion of the total number that would have been observed given a uniform distribution, with a maximum of one, for each possible state of the system. This will always produce an expected number of states that is less than or equal to the actual total number of states. Your uncertainty is then simply the logarithm of the expected number of states of the system.
So for example, if a system can be in 10 states, and after 100 observations, State A occurred 8 times, then that state would contribute 8/10 = .8 to the total expected number of states. In contrast, if State B occurred 12 times, then that state would contribute 1 to the total, since we cap the contribution at 1. If instead all states occur an equal number of times, then each state would contribute exactly 1 to the expected number of states, for a total of 10 states. As a result, if the distribution of states is uniform, then you end up simply taking the logarithm of the number of states.
The information content of a message is measured using the same method in the previous article linked to above, which is to simply measure the change in uncertainty that results from receipt of the message.