I’m working on the applications of A.I. to thermodynamics, and I had to solve for how to split a dataset into two parts, using objective criteria. Specifically, I’m interested in what’s moving, and what’s not, but the algorithm I came up with is general, and can be used to split any dataset in two, using objective criteria.
It is only a slight tweak on my original clustering algorithm, that requires an additional outer loop that iterates through levels of granularity, because you end up with a coin toss distribution, which produces very slight changes in entropy.
The “final_delta” variable is the threshold value that divides the dataset, so you can test for everything under or over that value, and that’s the dividing line.
The attached code splits a dataset of 50 million real number values in about thirty seconds, running on an iMac.
Command line code:
Full set of algorithms: