# Using Information Theory to Explain Color Perception

RGB encoding is an incredibly useful representational schema that has helped facilitate the proliferation and current ubiquity of high quality digital images. Nonetheless, the color space generated by RGB vectors using its natural Euclidean boundaries as a subset of 3-space is not representative of how human beings actually perceive differences in color. We could of course chalk this up to subjectivity, or some biological processes that cause human beings to inaccurately distinguish between colors. Instead, I think that human beings perceive colors in a perfectly rational, efficient manner, and that information theory can help us understand why it is that human beings don’t perceive changes in color and luminosity linearly. Specifically, I believe that human beings naturally, and rationally, construct efficient codes for the colors they perceive, leading to changes in perceived colors that are logarithmic, rather than linear in nature.

The Information Content of a Color

The fundamental observation that underlies all of information theory is the following equation due to Claude Shannon:

$I = \log(1/p)$,

where $I$ is the optimal code length (measured in bits) of a signal with a probability of p. In this particular case, the ultimate goal is to take an RGB color vector, map it to a probability, and from that, generate a code length using the equation above that we will treat as the information content of the color represented by the RGB vector (see the posts below for more on this approach).

In previous posts, I used this method to measure the information content of a set of pixels, despite the fact that a set of pixels is not a source, but is instead a static artifact. Nonetheless, a set of pixels will have some distribution of colors, that, while fixed, can be viewed as corresponding to a probability distribution. As a result, we can measure the information content of a group of pixels by treating the frequency of each color as a probability, and then calculating the entropy of the resulting probability distribution. But unlike a set of pixels, a single color has no distribution, and is instead, using RGB encoding, a single 3-vector of integers. As a result, there’s no obvious path to our desired probability.

Let’s begin by being even more abstract, and attempt to evaluate the information content of a single color channel in an RGB vector. This will be some integer x with a value from 0 to 255. If we’d like to make use of the equation above, we’ll have to associate x with some probability, and generate a code for it, the length of which we can treat as the information content of the color channel. One obvious approach would be to view x as a fraction of 255. This will produce a number $p = x/255$ that can be easily interpreted (at least mathematically) as a probability. But upon examination, we’ll see that this approach is unsatisfying from a theoretical perspective.

First, note that $x$ is measure of luminosity that we intend to map to a probability. Luminosity is an objective unit of measurement that does not depend upon its source. As a result, the probability we assign to x should similarly not depend upon the particular source we’re making use of, if we want our method to be objective. That is, a particular level of luminosity should be associated with the same probability regardless of the source that is generating it.

To make things less abstract, imagine that we had a series of 5 identical light bulbs bundled together. The more lights we have on, the greater the luminosity generated by the device. If we assign probabilities to these luminosity levels based upon the portion of lights that are on, then using four out of five lights would correspond to a probability of 4/5. Our code length for that outcome would then be $\log(5/4)$. Now imagine that we have a group of 10 identical light bulbs bundled together. On this device, using 4 of the lights produces a probability of 4/10, and a code length of $\log(10/4)$. In both cases, the luminosity generated would be the same, since in both cases, exactly 4 lights are on. This implies that, as a general matter, if we use this approach, our code lengths will depend upon the particular light source used, and will, therefore, not be an objective mapping from luminosity to probability.

Instead, what we’re really looking for is something akin to what Ray Solomonoff called the “universal prior”. That is, we’d like an objective ex ante probability that we can ascribe to a given luminosity without any context or other prior information. If we give this probability physical meaning, in this case, our probability would be an objective ex ante probability for seeing a particular luminosity. This might sound like a tall order, but it turns out that by using Shannon’s equation above, we can generate priors that are useful in the context of understanding how human beings perceive colors, even if we don’t believe that they are actually universal probabilities. In short, I think that the human brain uses universal priors because they work, not because they’re actually correct answers to cosmic questions like, “what’s the probability of seeing blue?”

Information and Luminosity

Thanks to modern physics, we know that light is quantized, and comes in little “chunks” called photons. This means that more luminous light sources literally generate more photons, and therefore, more information. For those that are interested, I’ve written a fairly in-depth study of this topic, and others, which you can find here:

A Computational Model of Time-Dilation

This does not, however, imply that perceived luminosity will vary in proportion to the actual luminosity of the source. Instead, I assume that perceived luminosity is proportional to the amount of information triggered by signals generated by our sense organs upon observing a light source. Under this view, what human beings perceive as luminosity is actually a measure of information content, and not a measure of luminosity itself. That is, one light appears to glow brighter than another because the brighter light triggers a signal in our sense organs that contains more information than the signal triggered by the darker light source. This implies that sense organs that detect light are designed to measure information, not luminosity. Moreover, this implies that what we perceive as light is a representation of the amount of information generated by a light source, and not the actual luminosity generated by the light source. In crude terms, human vision is a bit counter, not a light meter.

If this is true, then it’s not the number of photons generated by a light source that we perceive, but the number of bits required to represent the number of photons generated by the light source. We can make sense of this by assuming that our perceptions are representational, and that our brain has a code for “light”, and a code for luminosity. When we see a light with a given luminosity, our brain recognizes it as a light, and generates a code for “light”, and then attaches a code for the perceived level of luminosity, which then triggers a sense impression of a light with a particular brightness. This view assumes that what we experience as sight is quite literally a representation, triggered by codes that are generated by our senses.

This view might be philosophically troubling, but we’ll see shortly that it produces the right answers, so, as a pragmatist, I’m willing to consider the possibility that what I perceive is what Kant would call phenomena (i.e., a representation of a thing), and not noumena (i.e., the underlying thing itself). In some sense, this is trivially the case, since we interact with the world through our senses, and as a result, our senses are our only source of information about the external world. But when we measure luminosity with something other than our eyes, it turns out that there’s a systematic disconnect between perceived luminosity, and actual measured luminosity, suggesting that there is real physical meaning to Kant’s distinction between phenomena and noumena. That is, the fact that human beings perceive luminosity as a logarithmic function of actual measured luminosity suggests the possibility that, as a general matter, our perceptions are representational.

Leaving the philosophy, and returning to the mathematics, this view implies that the total signal information generated by our sense organs upon observing a light source with an actual luminosity of $L$ should be given by,

$i = \log(L) + C$,

where $C$ is a constant that represents the length of the code for perceiving light.

Note that, in this case, $i$ is not the information content of a Shannon code. Instead, $i$ is the raw signal information generated by our sense organs upon observing the light source itself. So let’s take things a step further, and assume that our sense organs are efficient, and make use of compression, which would mean that at some point along the set of processes that generate our perceptions, the initial code with a length of i is compressed. Further, for simplicity, let’s assume that it’s compressed using a Shannon code. In order to do so, we’ll need an ex ante probability. Again, we’re not suggesting that this probability is the “correct” probability of observing a particular signal, but rather, that it is a useful prior.

We can accomplish this by taking Shannon’s equation and solving for p. Specifically, we see that given an amount of information $i$, the probability associated with that information is given by,

$p = 1/2^i$ .

Again, we can agonize over the physical meaning of this probability, and say that it’s the correct ex ante prior for observing a given amount of information $i$. This has some intuitive appeal, since intuition suggests that as a general matter, low information events should be far more likely than high information events, which could explain why we find unusual events exciting and interesting. But we don’t need to take it that far. Instead, we can assume that our sense organs make use of a universal prior of this sort because it’s useful as a practical matter, not because it’s actually correct as an objective matter.

In this case, note that $p = 1/L$ (ignoring the constant term $C$), so plugging $p$ back into Shannon’s original equation, we obtain the following:

$I = \log(L)$.

In short, we end up at the same place. This suggests that whether or not our sense organs make use of signal compression, in the specific case of perceiving luminosity, we end up with a code whose length is given by the logarithm of the luminosity of the source, which is consistent with how human beings actually perceive luminosity.

In summation, human beings perceive luminosity logarithmically as a function of actual luminosity because what we are perceiving is a representation of the information content of a light source, not actual light, and not a representation of light.

If we take an RGB channel value as a proxy for luminosity, we can now finally express the information content of a single color channel value $x$ as simply $I = \log(x)$.

Color and Entropy

Luminosity is of course only one aspect of color, since colors also convey actual color information. To address this aspect, I assume that color itself is the consequence of the brain’s distinctions between the possible distributions of luminosity across different wavelengths of light. That is, a light source will have some total luminosity across all of the wavelengths that it generates, and the color ultimately perceived is the result of the distribution of that total luminosity among the individual wavelengths of light produced by the source. This is consistent with the fact that scalar multiples of a given RGB vector don’t change the color that is generated, but instead simply change the luminosity of the color.

Let’s consider the specific case of a given color vector $(x y z)$. The total luminosity of the vector is simply $L = x + y + z$. As a result, we can construct a distribution of luminosity given by,

$(p_1 p_2 p_3) = \frac{(x y z)}{L}$.

We can then take the entropy of $(p_1 p_2 p_3)$, which will give us a measure of the diffusion of the luminosity across the three channels. The maximum diffusion occurs when each channel has an equal amount of luminosity, which will produce no color at all, and scale from black to white, suggesting that color itself is the perceptual result of asymmetry in the distribution of information across wavelengths of light. In short, a high entropy color contains very little color information, and a low entropy color contains a lot of color information.

Comparing Colors Using Information Theory

Combining the information content of the color channels of an RGB vector, and the entropy of the vector, we can construct an overall measure of the difference between two colors $(x y z)$ and $(a b c)$ as follows:

$\delta L = ||\log((x y z)) - \log((a b c))||$,

and,

$\delta H= (H_1 - H_2)^2/\log(3)^2$,

where $H_1$ and $H_2$ are the respective entropies of $(x y z)$ and $(a b c)$. That is, we take the logarithm of the color vectors, then we take the norm of the difference, and this will give us a measure of the difference in luminosity between two colors. Then, we take the difference between the entropies of the two color vectors, which will give us a measure of how different the two colors are in terms of how much color information they convey. Note that each of $H_1$ and $H_2$ are necessarily less than or equal to $\log(3)$.

Though we can of course consider these two metrics separately, we can also combine them into a single metric that allows us to compare two colors:

$\delta T = 50 \delta L (\delta H + 1)$.

Since the logarithmic scale creates small differences between colors, I’ve chosen a multiplier of 50 to make the differences more noticeable, but this is otherwise an arbitrary scalar.

I’ve attached a color bar that begins with black (0 0 0), and increases linearly by a factor of 25.5 along the blue channel, together with a graph that shows the difference between each pair of adjacent colors, as measured by $\delta T$. As you can see, the colors become more similar as we traverse the color bar from left to right, despite the fact that their distances in Euclidean space are constant.

I’ve also attached another color bar that iterates through eight colors, but in this case, the increment moves from one channel to another, as if we were counting from 0 to 7 in binary using three bits.

The actual colors used in this case are as follows:

0 0 0
0 0 128
0 128 0
0 128 128
128 0 0
128 0 128
128 128 0
128 128 128

As you can see, yet again, the measured differences in color reflect how human beings actually perceive changes in color. Note that the greatest difference is between a non-primary blue, and primary red, which is intuitively correct both analytically (since there’s no intersection between the two colors) and visually (since there is a great deal of contrast between the two colors).

Perception and Symbolic Computation

In short, if we assume that human beings perceive colors as encoded representations of underlying physical objects, then by applying principles of information theory, we can generate equations that accurately reflect how human beings actually perceive colors. Taken as a whole, this is rather remarkable, since it suggests that the human body might, at least in some cases, operate like an efficient processor of representational information. This implies that our perceptions are the result of symbolic computations that manipulate representations, rather than one-to-one approximations of external objects. In less abstract terms, this view suggests that the way we perceive objects is more like the abstract symbolic representation of the number π , than a specific decimal approximation of the number represented by that symbol. In short, it could be that human beings not only think in terms of symbolic representations, but that we actually perceive in terms of symbolic representations. If correct, this suggests that attempts to mimic human intelligence should, at least in part, be focused on intelligent symbolic computation, and not just statistical techniques such as machine learning.