jesusfbes OP t1_iu0vao5 wrote on October 27, 2022 at 6:49 PM

Reply to comment by sapnupuasop in [D] [R] Large-scale clustering by jesusfbes

It does make sense, if your data point are of high dimension, say vector embedding for example. I believe that spark is an option, for algorithms that are not implemented as spectral clustering you have the primitives to make it yourself. Thank you for your response.

sapnupuasop t1_iu0zjy5 wrote on October 27, 2022 at 7:17 PM

Yeah was thinking of curse of dimensionality, with standard Euclidean distance for example, distances in high dimensions lose their meaning, but there are surely other distance which could function there. Btw I have used sklearn to cluster on couple millions of rows with sklearn successfully