Classifier labels are typically arbitrary, even when they describe the thing at issue. So for example, using the classifier label “1” for the number 1 in the MNIST Dataset doesn’t tell you how different an image of a 1 is from an image of a 2. This is generally not a question worth answering for prediction and clustering, since you don’t really care about the relative values of labels, and in particular, in this case, it’s probably more important that you report back that you’ve observed, e.g., a 1, rather than some label based upon the properties of the class of images in question.
You could however imagine other cases where you are interested in measuring how different the classes are from one another, for whatever reason. One simple way to do this, is to take the average vector for each class (or otherwise create an abstraction for each class), sort those average vectors, and then take the differences between adjacent entries in the sorted list. Assign a label of zero to the first vector in the list, and use the differences between adjacent entries to generate labels going up through the sorted list, by adding that difference to the previous classifier in the list. So the first average vector has a label of zero, then the next has a label equal to the difference between the first vector and the second vector, and so on, which is exactly what I did in a previous algorithm you can read about in Section 1.3 (“Expanding Gas Dataset”) of my paper Vectorized Deep Learning.