Submitted by zanzagaes2 t3_10xt36j in MachineLearning
zanzagaes2 OP t1_j7unjuw wrote
Reply to comment by schludy in [P] Creating an embedding from a CNN by zanzagaes2
I have not found a very convincing embedding yet, I have tried several that go from ~500 features (class activation map) to ~20.000 features (output of last convolutional layer before pooling), all generated from the full training set (~30.000 samples)
In all cases I do the same, I use PCA to reduce vectors to 1.000 features and UMAP or t-SNE (usually try both) to get a 2d vector I can scatter plot. I have tried to use UMAP for the full process but it doesn't escalate well enough. Is this a good approach?
Edit: I have added an image of the best embedding I have found until now as a reference
schludy t1_j7v11vj wrote
The individual steps sound ok, however, if you project 20.000 to 2D, the results you got look very reasonable. I'm not sure about UMAP, but I think for tSNE, it's recommended to have low dimensionality, something more in the order of 32 features. I would probably try to adjust the architecture, as other comments have suggested
zanzagaes2 OP t1_j7vpols wrote
You are right, both tSNE and UMAP documentation recommend going to 30-50 features before using them. In this case the result is quite similar to the one I found, though.
Viewing a single comment thread. View all comments