PassionatePossum t1_iu3gete wrote on October 28, 2022 at 7:38 AM

Reply to comment by sapnupuasop in [D] [R] Large-scale clustering by jesusfbes

I have done it before and it works well. But I guess it depends on the use-case. It is a classic technique in computer vision to cluster SIFT vectors (128 dimensions) on a training dataset. You then describe any image as a set of „visual words“ (i.e. the IDs of the clusters its SIFT vectors fall into).

A colleague of mine wrote the clustering algorithm himself. It was just a normal k-means with the nearest neighbor search replaced by an approximate nearest neighbor search to speed things up.