I’ve applied my knowledge-based prediction algorithm to the UCI Credit Dataset (which I’ve attached, already formatted), and the results are the same as always, peaking near 100% accuracy (using only 3,500 rows). What’s interesting is the use of ordinal values, in this case levels of education. Though I didn’t explicitly do it in this case, what you could do is simply use my normalization algorithm as applied to those dimensions that contain ordinal values. This will cause the normalization algorithm to use different orders of magnitude for the ordinal dimensions, thereby solving for the appropriate spacing between ordinal values. So for example, if the possible ordinal rankings are (1, 2, 3), then this application of the normalization algorithm will test, e.g., (10, 20, 30), (100, 200, 300), etc., each of which will produce different accuracies, with the best one selected.
Below is a plot of accuracy as a function of knowledge, which you can read about in the first paper linked to above:
Note that this will not work for truly arbitrary labelled data, where the values have no obvious numerical meaning (e.g., colors). In that case, you could iterate through different ordinal rankings of the labels (i.e., permutations), and then use the normalization method to solve for the best numerical values for the labels for each ordinal ranking. Nonetheless, attached is command line code that both normalizes and runs predictions on the UCI Credit Dataset. This is going to be included in some commercial release of my software, but given that it’s so powerful, I’m probably not going to release it on the Apple App Store as part of my Kickstarter Campaign (at least not this version of it).