Note that there’s an algorithm in my code archive that can form categorizations on datasets that consist of discrete sets, as opposed to Euclidean vectors (see, “Optimize_categories_intersection_N”).
This algorithm would be useful for categorizing data that consists of items with arbitrary labels, rather than positions. For example, datasets of consumer preferences (i.e., foods, music, movies, etc.).
I’m working on a research note explaining some theoretical aspects of the algorithm, but it’s no different than the “Optimize_categories_N” algorithm that I introduced in my main research paper. The only material distinction is that the intersection algorithm uses the intersection operator to compare items, rather than the Euclidean norm operator. That is, the “Optimize_categories_N” algorithm compares two elements by taking the norm of the difference between those two elements, whereas the intersection algorithm uses the intersection count between two sets.
I can see that the vectorized intersection image algorithm is quite popular (based upon the stats), so I thought I’d call attention to this algorithm as well, since it uses the exact same operator I discussed in the vectorized image partition article.