# Comparing Sequences of Labels

Just imagine monitoring the programs running over time on a CPU. It is obviously common to have multiple programs running at the same time, though typically you have only one program that is contained in some active window. However, even though a program is in the background, it will still enter the processor from time to time. As a result, we can imagine the programs coming in and out of the processor over time as a sequence expressed as a vector $(p_1,p_2,\ldots,p_k)$. Note that because there will likely be only one active window at a time, there will probably be some program that appears most often in memory, over any given period of time, though this isn’t terribly relevant to the more general concept I’m introducing, which is comparing sequences of labels. Specifically, in this case, it doesn’t mean anything to take the difference between $p_1$ and $p_2$, because they’re just labels that don’t have any intrinsic numerical value.

Nonetheless, we could for many reasons want to compare two sequences of labels. For example, just imagine some programs use more electrical power than others. You could for this reason, want to compare two sequences of labels so that you could predict what programs will follow, or how long a given program will remain open, etc. This would in turn, allow you to predict power consumption, CPU usage, battery life, etc. And I came up with a method earlier today that is quite simple, and allows you to compare arbitrary sequences of labels that can be different lengths, and contain different labels.

I’ll begin by example to establish an intuition, so let $a = (1, 2, 2, 5)$ and let $b = (1, 3, 2)$. Note that we are not treating the entries of the sequences as numbers, but instead as labels, so we could have used “apple”, “orange”, “quince”, rather than numbers, as it doesn’t matter, and you’ll see why in just a moment. The sequence $a$ is longer than $b$, so let’s begin with $a$, and begin with the first element $1$. We then search for $1$ in sequence $b$, to find that it is in exactly the same index, and so we assign it a score of 1, as if it contributed 1 full element to the intersection of two sets, and so the total pseudo-intersection score between $a$ and $b$ is thus far now 1. We then search for $2$, as it is the second element of $a$, to find that it is in index 3 of sequence $b$. Because it is not in exactly the same index, we treat it as contributing less than 1 full element to the pseudo-intersection of sequences $a$ and $b$. You could do this differently, but the equation I’m going to use is as follows:

$1 - \frac{N}{N+1}$,

where $N$ is the distance between the indexes in question. So in this case, because the $2$ from sequence $a$ is in index 2, and the $2$ from sequence $b$ is in index 3, $N = 1$, and so the pseudo intersection score is $\frac{1}{2}$. If we continue this process, we produce a total pseudo-intersection of $2.5$, since the second $2$ from $a$ is in index 3, and $5$ does not exist in sequence $b$. As is evident, the further away a given entry is from a corresponding entry, the less it contributes to the total pseudo-intersection, and if it’s not there at all, then it contributes nothing.

This would allow you to use my intersection clustering software to cluster these types of sequences, and make predictions (maximally intersecting sequences are the nearest neighbors of each other), but more interesting than that, this makes some sense of fractional cardinalities, which is something I’ve been working on lately in the context of the logarithm of fractions. Specifically, the information capacity of a system that can be in $N$ states, is $\log(N)$, but what does it mean for a system to have a fractional number of states? You could view it as Quantum Superposition, since you could say, every unit of energy within the system is subdivided among more than one state, meaning the energy of the system is always subdivided beyond unity, producing a fractional number of states for each unit of energy. You could also, say it’s the result of a pseudo-intersection of this type, that doesn’t really have any clear physical meaning, but has a mathematical meaning, as described above.

In any case, you can plainly use this technique to predict draws on capacity of all types, from food inventories and electricity to CPU’s themselves, and though it doesn’t offer any obvious means of making batteries or processors more efficient, the reality is, planning always allows for more efficiency.