This is basically the same as the original algorithm on the topic, though I fixed a few bugs, and provided a command line that makes it easy to generate a confidence / accuracy distribution. The runtime is astonishing, about 2 seconds per 30,000 rows. More to come soon.