# A Note on Contingent Acceleration

I just saw an airplane turn on my walk home, and it dawned on me how strange it is that a machine can change its own velocity, which you could argue is a consequence of how I think about time. Specifically, I think about time as a set of possible outcomes along a discrete graph, simply because it’s a practical way to think about how a system can change, especially if you’re writing software to model its behavior. That is, if the system is in a given state, then there is some set of possible next states, and so on, which will produce a discrete graph. I happen to think that time is actually physically structured this way, but what’s relevant for this discussion, is that you can certainly model time as a discrete graph, as a practical matter.

Returning to the airplane, when it’s heading straight ahead, there will be some set of possible next states for the airplane’s motion at any given moment. But if the airplane were an ordinary projectile, this would not be the case, macroscopically, and instead, it would have a simple parabolic trajectory that would be unchangeable, absent intervention. So an ordinary projectile moving through space near Earth, doesn’t really have an interesting path through time, but is instead deterministic, at least at our scale of observation, absent intervention. In contrast, because an airplane can turn, increase velocity, or perhaps tilt, at any given moment, the next moment in time could always be some legitimately different state, leading to a multiplicity of possible paths through time, representing the future outcomes that are possible at any given moment.

As an example, let’s assume the plane is flying itself, which is realistic, and during the flight, the plane responds to some weather data, that causes it to change its course, turning the plane in a particular direction. Tracing the events that lead to this outcome, we have an exogenous event, which is the weather itself, then we have the systems within the airplane, receiving a signal representing the change in the weather, and then a response in the software piloting the plane, and ultimately the physical mechanisms of the plane itself, within the engines and the wings, responding to the software’s instructions.

It’s not as if any of this violates the conservation of momentum, since the plane is powered by fuel and electricity, to which there’s no magic, though the plane’s role in this set of facts requires electrical and mechanical energy to be already extant within the plane, in order for it to first receive the transmission, calculate the appropriate response, and then physically execute that response. However, what is in my opinion fascinating, upon reflection, is that the ability of systems of this type to respond to exogenous signals, allows these systems to have complex paths through time, that are contingent upon information that is exogenous to the system. In this case, even when piloting itself, the airplane can effectively decide to change directions, meaning that at any given moment, its path through time is very different from any ordinary, inanimate projectile.

All systems can of course be accelerated by exogenous momentum, which is typically a collision. But this is different: this is instead the case of a system that is ultimately accelerated by an exogenous piece of information. It is ultimately the ability to respond to information about an environment that makes the path of a system like this through time fundamentally different from any inanimate system. Though airplanes are obviously distinguishable from living systems, what I suppose I never fully appreciated, is that they arguably have more in common, in terms of their behavior through time, with living systems, than inanimate systems, since they are capable of acceleration that is contingent upon exogenous information.

To make it perfectly clear, we can distinguish between, for example, the position of the wheel of a car, and the information transmitted to the airplane: the position of a wheel is a physical state of the car itself, whereas the transmission regarding the weather is a representation of the state of an exogenous system, that is transmitted to the plane, that in turn causes a response within the plane itself. The point being, that a plane can respond to a representation of the state of an exogenous system, which really is the ability to autonomously respond to information, as opposed to being accelerated by some circumstance physically connected to the plane.

# Optimization Algorithm

Assume we’re given a known domain $[a,b]$ for a function $F$, and that the function can be calculated (i.e., it is known to the algorithm). Further, assume we also have a known goal value $F(x_g) = y_g$ (e.g., a zero of the function). The purpose of this algorithm is to find the domain value $x_g$ that generates the goal value $y_g$.

Begin by evaluating the function some fixed number of times $K$ at random points in the domain, which will produce a set of $K$ range values. Then select the value $x$ for which $|y - y_g|$ is minimum, where $y = F(x)$. That is, you select the domain value that generates a range value that is closest to the goal value.

Assume that $\mu$ is the average difference between the range values and the goal value over the domain values selected. That is, we calculate the difference between each range value $y_i$ and the goal value $|y_i - y_g|$, and calculate the average over those differences.

Evaluate the following –

$\epsilon = \frac{|y - y_g|}{\mu}$.

That is, $\epsilon$ allows us to compare the difference between the distance from our best range value and the goal value, and the average distance between range values and the goal value.

Then evaluate the function another $K$ times in the domain, using $x$ as the seed value for the points that are randomly selected, over a range of $x \pm \epsilon \frac{b - a}{2}$. This will cause the domain to shrink as we get closer to the goal value.

Repeat this process until $|y - y_g|$ is within some tolerable error.

Attached is code that implements this algorithm, that can find the zero of a parabola, but it can be easily augmented to include any function. Note that this algorithm would work analogously in any dimension of Euclidean space.

The algorithm is not deterministic, and the runtime will of course depend upon your tolerance for error. In the case of a simple parabola, it can find the zero over a range of [-5,5], with a tolerance of $10^{-5}$, in about 0.005 seconds.

8_30CMNDLINE

least_error_optimization

# Identifying Periodic Behavior

Attached is an algorithm that can identify periodic behavior in data, and cluster the data, with values that appear in the same stage of the sequences identified being clustered together. The command line contains a simple example where a binary string has a known gap between the bits set to ‘1’, and the algorithm is tasked with identifying the size of the gap between the ‘1’s, thereby identifying the period of the dataset.

For example, the string $s = 100100100 ...$ has a gap of two between each of the ‘1’s, and the algorithm can identify this type of structure in polynomial time. You can then use basic modular arithmetic to predict entries beyond the observed dataset, assuming the patterns hold.

You can have multiple periods, you can add noise, and in a follow up article, I’ll combine this algorithm with my thermodynamics software, which will allow the algorithm to identify the cycles in a combustion engine. This is actually quite easy, and all you need to do is substitute the operator in the attached code with my mass-classification operator. This software would probably be useful for astronomy as well, since astronomers deal with complex cyclical phenomena all the time, due to orbits, rotations, etc.

8_24_CMNDLINE

linear_clustering

# A Note on Provability

It is necessarily the case that you can eventually generate all possible finite proofs on a UTM. It also necessarily the case that you can eventually generate all possible theorems on a UTM. This follows trivially from the fact that both proofs and theorems are finite collections of characters from some finite alphabet. As a result, it is necessarily the case, that you can generate all proofs, and all theorems, given enough time.

In this view, where we mechanistically generate all theorems and proofs, being able to prove a theorem is being able to connect a given theorem to its proof, since as demonstrated, generating all possible theorems and proofs is trivial, given enough time on a UTM. At the same time, Gödel’s First Incompleteness Theorem states that it is not as a general matter possible to prove or disprove all statements within a formal system. Therefore, Gödel’s theorem implies that we cannot map every theorem to its proof, since that would be equivalent to being able to prove every theorem in the system, which is forbidden by the theorem.

As a corollary, it must be the case that there is no computable mapping from the set of theorems to the set of proofs. If you assume otherwise, then you can prove every theorem in the system using a computable function, which would certainly violate Gödel’s theorem.

Does Gödel’s theorem preclude the existence of any such mapping as a general matter?

Or, does it imply that such any such mapping is not computable?

Another way to think about this is if we have a given theorem, we generate all proofs, until we find the proof for the theorem in question. However, we don’t know how long the proof is, even if we’re told ex ante that the theorem is in fact true. As a result, mapping a theorem to its proof in this manner is not decidable, if you assume you can’t know the length of the proof ex ante. The intuition is that a short theorem, could have an extremely long proof. This is arguably the case for the Robertson–Seymour Theorem, which has fairly succinct statements, but an incredibly long and complex proof, that I do not claim to understand.

Taking this to its extreme, we can at least posit the idea of a finite theorem, that requires an infinite proof. This is obviously beyond the realm of human knowledge, since human beings can only write a finite number of symbols over a lifetime. But the point is, such a theorem could exist, and if it does, it would be unprovable.

# A Note on Computability

When I studied the non-computable numbers in college, I remember thinking that existing proofs were pointlessly longwinded, since you could show quite plainly, that because the reals are uncountable, and the number of programs that can be run on a UTM is countable, there are more numbers than there are programs, which proves the existence of non-computable numbers. That is, there aren’t enough programs to calculate all of the reals, and so it must be the case that some real numbers aren’t associated with programs, and are therefore non-computable.

But it just dawned on me, that you could have this result, even without resorting to the fact the reals are uncountable, since it could simply be the case that the set of all inputs to a UTM maps to some countable, proper subset of a countable subset of the reals. Expressed formally, let $S \subset R$, and assume that $S$ is countable. That is, $S$ is a countable subset of the reals, and because it’s countable, it’s a proper subset (i.e., obviously, $S$ does not contain all of the reals). It could be simply be the case that the set of all inputs to a UTM, $A$, maps to only a subset of $S$. That is, even though $S$ is countable, it could could still be the case that $A$ maps to only a subset of $S$, and not the entire set.

Here’s a concrete example: Let $K$ be the set of all non-computable numbers, which you can easily show is uncountable. Now let $S$ be a countable subset of $K$. It follows that there is no mapping at all from $A$ to $S$, since by definition, for each $x \in S$, there is no program that calculates $x$.

So aside from this little result, I suppose the point is, that even if all of the reals don’t exist, you could still be stuck with non-computable numbers.

# New Book: Kate

I’ve started a new book, “Kate”, about a couple living in New York City in the 1960’s. Unlike the last book, I intend for this book to be fairly normal, though because it’s historical fiction, it will likely take me a really long time to write, as I intend to do a lot of research to ensure the work is historically accurate.

Enjoy!

Kate

# On the Limits of Automated Discovery

As noted, I’m working on a physics engine that will allow for the automated discovery of physical laws, just like machine learning is used in many other areas of study, allowing for the automated discovery of useful relationships between observables. While running this morning, I stumbled upon what I thought to be an interesting, related question:

Can a UTM design another machine that is more powerful than a UTM?

This is distinct from a UTM writing a program that is more powerful than a UTM, which is on its face nonsense, since the resultant program will run on a UTM in the first instance. This is instead, the question of whether or not a Turing Equivalent machine can design another machine altogether, that is superior to a UTM?

Let $U$ be some UTM, and assume that a program $f$ causes $U$ to discover the design of some other machine $X$, that is superior to a UTM. It must be the case that the output $Z = U(f)$, is some representation of the machine $X$. If $Z$ is a complete and accurate representation of the machine $X$, then it must be possible to physically build $X$, given the representation $Z$. In plain English, if the computer really discovers another machine, then it must be the case that the computer can tell you how to build the new machine.

As a result, there is some mechanical process that produces the actual machine $X$, given the representation $Z$. This implies that a sufficiently dexterous machine can actually construct $X$ given $Z$, without human assistance. This in turn implies that the construction of the new machine $X$ is a mechanical process, which, according to the Church-Turing Thesis, can be simulated by a UTM.

But that’s not the correct question:

The correct question is whether the behavior of $X$ can be put into a 1-to-1 correspondence with the behavior of a UTM. This turns on the rules of physics, not computer theory. That is, we can’t know the behavior of $X$, without first knowing all of the physics relevant to the behavior of $X$. This doesn’t imply that it is possible for a UTM to design a machine that is superior to a UTM, but rather, that computer theory alone can’t provide an answer to this question.

# Using Vectorization to Measure Velocity

Continuing my work on the applications of A.I. to physics, in this article, I’m going to present very simple software that can take a large number of observations of an object, that has a macroscopic rectilinear motion, and calculate its velocity.

The reason this software is interesting, is because the motions of the objects are random, but nonetheless generate a macroscopically rectilinear motion. This is how real world objects actually behave, in that the parts of a projectile don’t move in the same direction at the same time, but instead have variability that simply isn’t perceptible to human beings, but might nonetheless get picked up by a sensor, making the problem of measuring velocity non-trivial, when given a lot of data.

Though you don’t need to read it to appreciate the software I’ll present in this article, all of this is based upon my model of motion, which I presented in my main paper on physics, “A Computational Model of Time-Dilation” –

I’m simply memorializing the model using code, which will eventually allow for a complete model of physics, including the correct equations of time-dilation, that can be easily analyzed using my entire library of A.I. software.

My ultimate goal is to allow for the autonomous discovery of useful models of physical systems, leaving physicists free to do the higher work of discovering theoretical connections between observables, and then allowing machines to uncover approximations of their relationships, which should in turn assist physicists in their discovery of the correct, exact, mathematical relationships. This software will also be useful for the cases where the correct relationship is impossible to discover, due to intractability, or because it simply doesn’t exist in the first instance, which is a real possibility, if reality is non-computable. There’s also, obviously, the commercial case, where you don’t need the correct equation, and a model that works is all you really need.

A statistical sphere moving in Euclidean space over 16 frames of observation.

The example in this case consists of a statistical sphere, that has 1,000 independently moving points, that change position over a series of 16 frames. The velocities of the points are independent, but selected in a manner that ensures that the relative positions of the points remain approximately constant on average over time. For an explanation of the mathematics underlying the model, see Section 3.3 of the paper linked to above, but you can plainly see combinatorial relationships unfolding in the paths of the individual points, with the outcomes more dispersed in the center of each path, which I’ll explore in a follow up article, clustering the paths of the particles, and generating state graphs.

The method itself is incredibly simple:

The algorithm approximates the origin of the shape in each frame, by taking the average over each of the x, y, and z coordinates, separately, for each frame, assuming the overall shape is a sphere, which it obviously isn’t, since there’s some dispersion. However, it doesn’t matter, because the dispersion is roughly symmetrical, producing a meaningful point from which we can measure displacement.

The rate of displacement in this approximated origin is then calculated for each frame, and compared to the true, average displacement over the entire set of points for each frame. If there are $N$ frames, then this process will calculate the true average velocity for each frame, expressed as an $N \times 3$ matrix, and compare that to the approximated average velocity for each frame, also expressed as an $N \times 3$ matrix (note that the x, y, and z velocities are independent).

Approximating the average velocity takes about 2 seconds running on an iMac, and the  average error is $O(10^{-14})$.

# A Very Minimalist Robot

As noted, I’m working on the intersections between physics and A.I., and though I don’t spend much time thinking about robotics, it comes up now and then, since I think about complexity of motion or gesture, for example. And what I realized is, what you can probably get away with, in most practical circumstances, is a robot that can move, grab objects, and move objects –

This is not going to work for everything, but it will work for many practical real world problems, which usually involve only these things.

A robot comprised of a series of flexible, retractible joints, arranged as a panel, with retractable wheels below, to allow for motion, and grabbing.

Taking it to its extreme, just create a panel that is entirely comprised of powered, flexible joints, with wheels on the bottom. This will allow the panel to change its shape, into pretty much anything, by twisting, contracting, expanding, etc. And if you make the wheels extendible, then a group of wheels can together squeeze an object, operating as a grip, leaving the balance of the robot to make contact with the ground below, thereby allowing the robot to grab and move objects. My software can definitely power something like this, leaving the engineering questions of materials, motors, and electrical power. Note that adding vision to something like this is trivial.

You could even imagine making these capable of being combined, since they would be fungible panels. It shouldn’t take much to make them autonomous, which means they could even combine and separate on their own, based upon the task at hand, allowing for even greater flexibility in problem solving. For example, a group of small panels could break off from the whole, to tackle a problem that requires small scale manipulation. This could allow a large number of small panels to adjust the position of a complex object, that would, for example, then allow for the object to be grabbed by the entire unit, post reassembly. Or panels could partially separate, allowing for the emergence of appendages.

# Variance-Based Clustering

I’ve long noted the connections between standard deviation, entropy, and complexity, and in fact, my first image clustering algorithm was based upon the connections between these measures (See Section 2 of my paper, “A New Model of Artificial Intelligence“). However, the focus of my work shifted to entropy-based clustering, simply because it is so effective. I’ve since discovered an even simpler, third model of clustering, that is based in counting the number of points in a cluster, as you iterate through cluster sizes, and so it is incredibly efficient, since it requires only basic operations to work.

However, it requires a number that allows us to express distance in the dataset, in order to operate. Specifically, the algorithm operates by beginning at a given point in the dataset, and moving out in quantized distances, counting all the points that are within that distance of the original point. Any sufficiently small value will eventually work, but it will obviously affect the number of iterations necessary. That is, if your quantized distance is too small for the context of the dataset, then your number of iterations will be extremely large, causing the algorithm to be slow. If your quantized distance is too big, then you’re going to jump past all the data, and the algorithm simply won’t work.

As noted in this article on identifying macroscopic objects, I initially used my value “delta”, which you can read about in my original paper linked to above, that is reasonably fast to calculate, but nonetheless can start to take time as your dataset has hundreds of thousands, or millions of vectors. Since the goal of this round of articles is to build up a model of thermodynamics, I need to be able to quickly process millions of vectors, preferably tens of millions, to get a meaningful snapshot of the microstates of a thermodynamic system.

What I realized this morning, is that you can take the difference between adjacent entries in the dataset, after it’s been sorted, and this will give you a meaningful measure of how far apart items in the dataset really are. What’s more important, is that this is a basically instantaneous calculation, which in turn allows my new clustering algorithm to run with basically no preprocessing.

Five statistical spheres, after being clustered, colored by cluster.

The results are simply astonishing:

Using a dataset of 2,619,033 Euclidean 3-vectors, that together comprise 5 statistical spheres, the clustering algorithm took only 16.5 seconds to cluster the dataset into exactly 5 clusters, with absolutely no errors at all, running on an iMac.

Complexity of the Algorithm

Sort the dataset by row values, and let $X_{min}$ be the minimum element, $X_{max}$ be the maximum element, and let $N$ be the number of elements. Then take the norm of the difference between adjacent entries, $Norm(i) = ||X(i) - X(i+1)||$, and let $\mu$ be the average over that set of norms.

The complexity is worst-case, $O(N \frac{||X_{min} - X_{max}||}{\mu})$. However, if the dataset consists of $K$ clearly defined objects, then its complexity is $O(K \frac{||X_{min} - X_{max}||}{\mu})$, and is therefore, independent of the number of vectors in the dataset.

This assumes that all vectorized operations are truly parallel, which is probably not the case for extremely large datasets run on a home computer. However, while I don’t know the particulars of the implementation, it is clear, based upon actual performance, that languages such as MATLAB and Octave successfully implement vectorized operations in a parallel manner, even on a home computer.

Octave Algorithms:

8_9CMNDLINE

EMC_anchor_clustering

generate_sphere

Full A.I. Library:

ResearchGate