Skin Cancer Classification

I’m going to write something formal over the coming week, but in the short run, here are the runtime and accuracy results of the methods introduced in a prior article on medical imaging classification using my software, as applied to a Skin Cancer Dataset from Harvard.

Summary of Results

Original Dataset Size: 7470 RGB images of various dimensions;

Compressed Dataset Size: 7470 x 198;

Preprocessing time: 37.9 seconds;

Supervision Training time: 55.9 seconds;

Prediction time: 13 seconds, on average (run 25 times);

Prediction accuracy:

Worst case, 85.542% (no supervision);

Best case, 95.8337% (highest level supervision, rejecting all but 62 rows).

Bottom line: Reliable diagnosis for over 7,000 patients, on a home computer, in about 2 minutes.

Summary of Process

The dataset consists of just over 10,000 images of legions. Each legion belongs to one of seven classes of legions, three of which are malignant. The algorithm consolidates all malignant classes into one, and consolidates all benign classes into one. It removes all duplicate images, leaving only one image per patient. All images are then compressed, and fed to a supervised algorithm that finds the minimum and maximum distances over which classification labels are consistent within the dataset. Then prediction is applied using decreasingly sensitive criteria for flagging predictions as outside the scope of the training dataset.

I’ve also attached a “STAT SUPERVISION” script that can be applied without consolidating classes, and generates about 80% accuracy (also using rejections). This is the same algorithm I introduced in Section 1.4 of this paper, for the “Statistical Spheres” dataset, the only difference here is the clusters don’t have the same classifier, but the algorithm is exactly the same.

There’s another method called “Isolate Classes”, the code for which is also attached, that I’ll explain fully in a separate post (that doesn’t quite yet work), which was actually the original approach, which is to isolate a single class, and try to identify which rows are in that class. This works out nicely on parallel machines, because you run tests for each class simultaneously, but this is not something you can do on a PC.

Information Overload

Skin Cancer Classification

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply