# Nearest Neighbor Clustering

Following up on my previous post, below is code that generates clusters given the nearest neighbors of a dataset. This ends up generating clusters for each vector, and also, a graph for the entire dataset, that represents how vectors are connected to one another through being the nearest neighbors of one another.

To generate a cluster, simply take the i-th row of the graph matrix, and that row will tell you which vectors generated row i as their nearest neighbor.

I just posted something related on Twitter, regarding graph entropy, and it dawned on me, that it’s impossible for the resultant graph to be complete –

Every vector will have a nearest neighbor, and therefore, the out-degree of every vertex in the resultant graph (this is a directed graph) is greater than zero, and exactly one. However, if any vector is the nearest neighbor of every other vector, then all vectors point to that single vector (the “hub vector”), and therefore, no other vector can be the nearest neighbor of any other vector, other than the one vector that is the nearest neighbor of the hub vector.

In fact, if this is the case, then there is necessarily a directed loop in the graph, from the hub vector, to its nearest neighbor, and back.

function graph_matrix = generate_NN_graph(dataset, nearest_neighbors, N)

num_rows = size(dataset,1);
graph_matrix = zeros(num_rows,num_rows);
for i = 1 : num_rows
a = find(nearest_neighbors == i);
graph_matrix(i,a) = 1;
endfor

endfunction