UCI Sonar Dataset

I was on a Meetup video conference, and someone mentioned a dataset that doesn’t cluster well (the UCI Sonar Dataset), so I naturally did a bit of work on it, and it turns out, the dataset literally contains very few clusters. Specifically, roughly 34% of the rows are contained in spherical clusters. The average cluster size over all rows is about .8 elements per cluster, again suggesting that you’re not going to get good clustering out of this dataset, because there are no real clusters to begin with, just as a matter of geometry. Nearest Neighbor nonetheless performs reasonably well, with an accuracy of 82.692%.

Code attached.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s