Submitted by jesusfbes t3_yexifs in MachineLearning
Is there any computationally efficient implementation of clustering methods, beyond vanilla kmeans, for lot of data? Let's say 1M data points of 100s dimensions. I know about sci-kit learn and a couple others but I wonder if the will suit that data sizes.
TheLionKing2020 t1_iu1bcw8 wrote
Well, you don't need to train on all of these data
First take samples of 10k, 50k and 100k and see if you have different results. Do you get different number of clusters?