Complexity of Motion

I’ve done a ton of work on complexity of motion and gestures, some specifically applied to robotics, and a lot of it unpublished, which annoyed me, but I’m pleased that the paper I just got up and running (not quite final yet), Sorting, Information, and Recursion, includes an equation that is plainly the culmination of this work, and would allow you to measure the entropy of a sequence of motions, which can be found in Equations (1) and (2) of the paper.

Specifically, simply represent the velocity of a system over time as a sequence of velocity vectors (v_1, \ldots, v_k), and if you can express the differences between all adjacent vectors (v_i, v_{i+1}) as either a real number or a real number vector, then you can use the equations in that paper to calculate an order dependent analog of entropy, that I discuss in some detail. It behaves exactly the way you’d want, which is if motions are highly volatile in sequence, you get a higher entropy, and if the velocity is constant, you get a zero entropy. I discuss this in more detail in the paper, and in particular, in Footnotes 5 and 6.

You can look at it three ways: (1) take the sequence (v_1, \ldots, v_k) as it is observed, (2) sort it, which will minimize the entropy (see Corollary 3.1), or (3) apply another ordering that will maximize the entropy (see Footnote 8). These three measures tell you (1) what the real sequence entropy is, (2) what its theoretical minimum is, and (3) what its theoretical maximum is. This could be useful where you don’t have total control of the sequence itself, and instead can only set the individual velocities, giving you an objective criteria that would allow you to compare the complexities of two sets of motions. The lower the entropy, the smoother the motion should look to an observer, which is important not just for robotics, but also all modes of transportation, where people naturally feel frightened by sudden acceleration.

Comparing Sequences of Labels

Just imagine monitoring the programs running over time on a CPU. It is obviously common to have multiple programs running at the same time, though typically you have only one program that is contained in some active window. However, even though a program is in the background, it will still enter the processor from time to time. As a result, we can imagine the programs coming in and out of the processor over time as a sequence expressed as a vector (p_1,p_2,\ldots,p_k). Note that because there will likely be only one active window at a time, there will probably be some program that appears most often in memory, over any given period of time, though this isn’t terribly relevant to the more general concept I’m introducing, which is comparing sequences of labels. Specifically, in this case, it doesn’t mean anything to take the difference between p_1 and p_2, because they’re just labels that don’t have any intrinsic numerical value.

Nonetheless, we could for many reasons want to compare two sequences of labels. For example, just imagine some programs use more electrical power than others. You could for this reason, want to compare two sequences of labels so that you could predict what programs will follow, or how long a given program will remain open, etc. This would in turn, allow you to predict power consumption, CPU usage, battery life, etc. And I came up with a method earlier today that is quite simple, and allows you to compare arbitrary sequences of labels that can be different lengths, and contain different labels.

I’ll begin by example to establish an intuition, so let a = (1, 2, 2, 5) and let b = (1, 3, 2). Note that we are not treating the entries of the sequences as numbers, but instead as labels, so we could have used “apple”, “orange”, “quince”, rather than numbers, as it doesn’t matter, and you’ll see why in just a moment. The sequence a is longer than b, so let’s begin with a, and begin with the first element 1. We then search for 1 in sequence b, to find that it is in exactly the same index, and so we assign it a score of 1, as if it contributed 1 full element to the intersection of two sets, and so the total pseudo-intersection score between a and b is thus far now 1. We then search for 2, as it is the second element of a, to find that it is in index 3 of sequence b. Because it is not in exactly the same index, we treat it as contributing less than 1 full element to the pseudo-intersection of sequences a and b. You could do this differently, but the equation I’m going to use is as follows:

1 - \frac{N}{N+1},

where N is the distance between the indexes in question. So in this case, because the 2 from sequence a is in index 2, and the 2 from sequence b is in index 3, N = 1, and so the pseudo intersection score is \frac{1}{2}. If we continue this process, we produce a total pseudo-intersection of 2.5, since the second 2 from a is in index 3, and 5 does not exist in sequence b. As is evident, the further away a given entry is from a corresponding entry, the less it contributes to the total pseudo-intersection, and if it’s not there at all, then it contributes nothing.

This would allow you to use my intersection clustering software to cluster these types of sequences, and make predictions (maximally intersecting sequences are the nearest neighbors of each other), but more interesting than that, this makes some sense of fractional cardinalities, which is something I’ve been working on lately in the context of the logarithm of fractions. Specifically, the information capacity of a system that can be in N states, is \log(N), but what does it mean for a system to have a fractional number of states? You could view it as Quantum Superposition, since you could say, every unit of energy within the system is subdivided among more than one state, meaning the energy of the system is always subdivided beyond unity, producing a fractional number of states for each unit of energy. You could also, say it’s the result of a pseudo-intersection of this type, that doesn’t really have any clear physical meaning, but has a mathematical meaning, as described above.

In any case, you can plainly use this technique to predict draws on capacity of all types, from food inventories and electricity to CPU’s themselves, and though it doesn’t offer any obvious means of making batteries or processors more efficient, the reality is, planning always allows for more efficiency.

Sorting, Information, and Recursion

I’ve updated the paper that discusses some of the recent results I’ve proved on this blog, to include heavily annotated code, given that there seems to have been some confusion about the runtime of the algorithms on reddit.

The results prove that (1) a list is sorted if and only if the distance between adjacent entires is minimized, and (2) a list is sorted if and only if an encoding of the list as a particular class of recurrence relation minimizes information. These results together demonstrate an equivalence between sorting and minimizing information, which is in my opinion, non-obvious. I also introduced what I believe to be the fastest known sorting algorithms, one with an O(N) runtime, and another with an O(\log(N)) runtime, both of which are already faster than the lower bounds of O(n\log(n)) for serial sorting algorithms (both algorithms are parallel). The code for both are attached to the paper linked to above, which you can also find below.

 

On Sorting and Nearest Neighbor

This is now a formal paper, available here.

It dawned on me yesterday that sorting a list of numbers is equivalent to minimizing the distance between adjacent entries in the list. Sorting has obviously been around for a long time, so this could easily be a known result, that perhaps even appears in college textbooks, that I simply forgot, but that’s not the point –

The point is, it suggests a more general notion of sorting that would apply to all mathematical objects for which there is a measure of distance F: S \times S \rightarrow \mathbb{R},

where S is the set of objects in question. That is, if you can compare every pair of objects and map the difference between them to the real line, then you can define a partial order on the set in question, that minimizes the distance between adjacent pairs in some directed graph, and that doing so, is the abstract analog of sorting a list of numbers. The reason this is important, to me at least, is that it allows you to use a measure of entropy that I developed for planar waves, to be applied to arbitrary sets of objects that meet this requirement.

Lemma. A list of real numbers (a_1, a_2, \ldots, a_k) is sorted, if and only if, the distance |a_i - a_{i+1}| is minimal for all i.

Proof. Assume the list is sorted in ascending order, and that the lemma is false. Because the list is sorted, the distance |a_1- a_2|, must be minimal for a_1, since by definition, all other elements are greater than a_2. By analogy, the distance |a_{k-1} - a_k| must also be minimal for a_k. Therefore it follows that there must be some a_m, for which either –

(1) |a_i - a_m| < |a_i - a_{i+1}|,

or,

(2) |a_{i+1} - a_m| < |a_i - a_{i+1}|.

That is, because we have eliminated the first and last entries in the list, in order for the lemma to be false, there must be some pair of adjacent entries, and some third entry, with a distance to one of those two that is less than the distance between the pair itself. Because assuming a_m < a_i, or that a_m > a_{i+1} simply changes the indexes of the proof, it must be the case that a_i < a_m < a_{i+1}, which in turn contradicts the assumption that the list is sorted.

The proof in the case of a descending list is analogous.

Now we will prove by induction that if |a_i - a_{i+1}| is minimal for all i, then the list is sorted:

Assume we start with a single item, a_i. Then, we want to insert some new item, a_j, in order to generate a list in ascending order, since a proof for descending order is analogous. Because there are only two items, a_i is the nearest neighbor of the new item a_j (i.e., the distance between a_i and a_j is minimal). If a_j > a_i, then we insert a_j to the right of a_i, and otherwise, we insert a_j to the left of a_i.

Now assume the lemma holds for some number of insertions k \geq 2. It follows that this will imply a sorted list (a_1, a_2, \ldots, a_k). If there is more than one nearest neighbor for a given insert (i.e., two items of equal distance from the insert item) that are both equal in value, then we find either the leftmost such item, or the rightmost such item, depending upon whether the inserted item is less than its nearest neighbor, or greater than its nearest neighbor, respectively. If the insert is equidistant between two unequal adjacent items, then we insert it between them.

Now assume we are to insert item a_m, which will be insertion number k + 1. Further, assume that following this process causes the resultant list of k + 1 items to be unsorted, solely as a result of this insertion (note we assumed that the list is sorted beforehand), and further, assume that the indexes are such that (a_1, a_2, \ldots, a_k) is the correct set of indexes for the sorted list prior to the insertion of a_m.

Now assume the process above places a_m at the front of the list. Because the list is unsorted, it must be that a_m > a_1, but this is impossible, according to the process above, which would place it to the right of a_1. Now assume the process above places a_m at the end of the list. Because the list is unsorted, it must be that a_m < a_k, but again, the process above dictates an insertion to the left.

Therefore, since by assumption the list is unsorted, it must be the case that a_m is inserted between two items in the list a_i, a_{i+1}. By the process described above, it must also be the case that the distance to at least one of them is minimal, so assume that a_i is the nearest neighbor of a_m. Again, since by assumption, the list is out of order, it must be the case that either –

(1) a_i > a_m,

or

(2) a_m > a_{i+1}.

In case (1), again, the process above dictates insertion to the left of a_i, which eliminates this case as a possibility. In case (2), it must be that a_{i+1} is the nearest neighbor of a_m, which leads to a contradiction. The cases where a_{i+1} is the nearest neighbor of a_m are analogous, which completes the proof. □

Though there are some wrinkles I haven’t worked out, this alludes to a linear runtime sorting algorithm for sorting not only real numbers, but anything capable of comparison that maps to real numbers. This is because the nearest neighbor algorithm itself can be fully vectorized into a linear runtime algorithm on a sufficiently parallel machine. The problem is, some of the steps described above, cannot be vectorized, at least in Matlab. However, I think a working algorithm not covered by the proof above, at least as stated, would always look for the nearest unequal neighbor, that a given item is less than. That is, for a given item, the algorithm returns an item that is strictly greater the item in question (call this, “the least larger neighbor”). Then you simply have a matrix, with a number of columns equal to the number of items, and a number of rows, also equal to the number of items, which represents in this case a directed graph. Then, for a_i, if the least larger neighbor is a_j, you simply set all entries in row i to zero, save for column j, which you set to one, denoting an edge from a_i to a_j, which is no different than a sorted linked list.

Attached is Octave code that implements this algorithm (not the one from the proof), which seems to work, though I have not proved that it works just yet. It’s set up to sort unique lists of numbers, so if you have duplicates in your list, simply insert them after the fact, which again takes O(N) steps on a vectorized machine, by simply searching for each duplicate entry, and inserting it to the left or right of its copy. This produces a sorting algorithm that has a worst case O(N) runtime on a parallel machine, and I’ll follow up with a proof that it works (assuming it does). If you implement the minimum operator in O(\log(N)) parallel steps, which can be done by successively taking the difference between adjacent terms, and deleting the left hand terms that produce a positive difference (i.e., if a – b is positive, then a cannot be the minimum element), then the sorting algorithm has a run time of O(\log(N)) parallel steps, which assumes no duplicate entries.

Returning again to the measure of entropy linked to above, you can view it as measuring how much information you need to represent a set of numbers using recursion. For example, if you sort the set of numbers \{1,2,3\}, then you can express the set as a sequence (1,1,1), where you begin with 0, and add each entry in the vector in order, i.e., ( ( ( 1 ) + 1 ) + 1 ). If you first sort the set of numbers in question, you will minimize the amount of information required to encode the set of numbers as a recursive sequence, because it takes less information to encode smaller numbers than larger numbers. Note that this also requires less information than encoding the set itself. This suggests an equivalence between sorting a set of numbers, and minimizing the amount of information required to represent that set using recursion of this form.

You can then generalize this to vectors, and encode a set of vectors, as a sequence of vectors expressed analogously in some order.

All of this lead to the following conjectures, that I’m simply not taking the time to even attempt to prove:

Conjecture 1. There is no sorting algorithm that can generate a total order on a set of unique numbers in less than O(\log(N)) steps;

Conjecture 2. There is no sorting algorithm that can generate a partial order on a set of numbers (allowing for duplicates) in less than O(\log(N)) steps;

Conjecture 3. There is no sorting algorithm that can generate a total order on a set of numbers (allowing for duplicates) in less than O(N) steps,

in each case, where N is the number of items in the set.

Motion Classification Using Time-Series Average

Continuing with recent articles on gesture classification and other user interface classification techniques, I realized late yesterday that you can classify some motions using a simple time-average, that produces motion blur, provided the motions are simple enough that this will work. In this case, it’s the same gesture dataset I’ve been working with in the past, where I raise either my left hand or my right hand. By taking a time-average of the images in a given sequence (i.e., movie file), this produces motion blur on one side of my body. This in turn, produces a single-image classification task, for which I apply my standard image classification algorithm. That is, each sequence of images produces a single image, that contains motion blur. I do not think this is needed for the touch-screen based user interface that I’m working on, but I thought it was interesting, since it takes what is arguably a hard problem, and turns it into a basic single-image classification task. This obviously won’t work for all tasks, but it will work for obvious gestures, and other simple motions. Specifically, it won’t work if the order of the motions in a gesture is important, since averaging destroys information about the order in which the underlying motions occurred.

The time-series average of one gesture movie file, shown as a single image.

In this case, the accuracy is perfect, though this is plainly a very simple dataset. I need to do more careful analysis to get a better sense of the actual runtime, but initial testing suggests it can process at least one frame per second, and I think more careful measurement of the runtimes of the algorithms themselves (as opposed to, e.g., loading and converting the images to grayscale), would yield a lower runtime, but further work on this is not rational, as I don’t seem to need it for anything. In terms of process, what this algorithm does is calculate the average for every pixel position over a sequence of images, and does so in a vectorized manner. On a truly parallel machine, the runtime of this step would depend only upon the number of images (i.e., the number of summands), not the number of pixels (i.e., the number of independent averages being calculated). Because the number of pixels is large, it is of course possible that this is not fully vectorized on, e.g., my iMac, but it’s still really fast, and it works.

Here’s the command line code: