Splitting a Dataset

I’m working on the applications of A.I. to thermodynamics, and I had to solve for how to split a dataset into two parts, using objective criteria. Specifically, I’m interested in what’s moving, and what’s not, but the algorithm I came up with is general, and can be used to split any dataset in two, using objective criteria.

It is only a slight tweak on my original clustering algorithm, that requires an additional outer loop that iterates through levels of granularity, because you end up with a coin toss distribution, which produces very slight changes in entropy.

The “final_delta” variable is the threshold value that divides the dataset, so you can test for everything under or over that value, and that’s the dividing line.

The attached code splits a dataset of 50 million real number values in about thirty seconds, running on an iMac.

Command line code:

Split the data

Full set of algorithms:

ResearchGate

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s