# Analyzing Microstate Point Data in Thermodynamics

In previous articles, I introduced my algorithms for analyzing mass-scale datasets, in particular, clustering, and other algorithms. However, those algorithms deliberately avoided operations on the underlying microstate data, since these datasets are comprised of tens of millions of Euclidean vectors.

I’ve now turned my attention to analyzing changes in thermodynamics states, which would benefit from direct measurements of the underlying point data in the microstates themselves. At the same time, any sensible observation of a thermodynamic system is going to consist of at least tens of thousands, and possibly millions of observations. My algorithms can already handle this, but they avoid the underlying point data, and instead use compression as a workaround.

In contrast, the algorithm below actually clusters an enormous number of real value points, with radical efficiency, with the intent being to capture changes in position in individual points in thermodynamic systems.

This is a simple example attached that clusters 225,000 real number values in about 120 seconds, running on an iMac –

This is simply ridiculous.

I’m still refining this algorithm, but thought I would share it now, in the interest of owning claim to it.

The accuracy is just under 100%, though this is a simple example.

You could do this a few times on different dimensions independently, and then take the clusters that each dimension produces in common, using an intersection operator, which is extremely fast in MATLAB.

This will allow you to cluster higher dimensional vectors of numbers, though some vectors might end up not getting clustered using this approach. But with this much data, it probably doesn’t matter.

This would in turn allow you to build a prediction model using any of my prediction algorithms. For my model of prediction, you only need one vector per cluster, and in my admittedly limited experimentation thus far, the number of clusters is quite low, implying that prediction should be fast, despite these enormous datasets.

vectorized_EMC_clustering

CMNDLINE 7-28