zanzagaes2
zanzagaes2 OP t1_j7vpols wrote
Reply to comment by schludy in [P] Creating an embedding from a CNN by zanzagaes2
You are right, both tSNE and UMAP documentation recommend going to 30-50 features before using them. In this case the result is quite similar to the one I found, though.
zanzagaes2 OP t1_j7vpd89 wrote
Reply to comment by schludy in [P] Creating an embedding from a CNN by zanzagaes2
Yes, I think that's the case because I am getting far more reasonable values comparing the projection to 2d/3d of the embedding rather than the full 500 feature vector.
Is there a better way to do this than projecting into a smaller space (using reduction dimensionality techniques or encoder-decoder approach) and using L2 there?
zanzagaes2 OP t1_j7uual3 wrote
Reply to comment by Tober447 in [P] Creating an embedding from a CNN by zanzagaes2
Yes, that's a great idea. I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?
Your perspective has been really helpful, thank you
zanzagaes2 OP t1_j7unuq2 wrote
Reply to comment by Tober447 in [P] Creating an embedding from a CNN by zanzagaes2
May I use some part of the trained model to avoid retraining from scratch? The current model has very decent precision and I have generated some other visualizations for it (like heatmaps) so doing work around this model would be very convenient.
Edit: I have added an image of the best embedding I have found until now as a reference
zanzagaes2 OP t1_j7unjuw wrote
Reply to comment by schludy in [P] Creating an embedding from a CNN by zanzagaes2
I have not found a very convincing embedding yet, I have tried several that go from ~500 features (class activation map) to ~20.000 features (output of last convolutional layer before pooling), all generated from the full training set (~30.000 samples)
In all cases I do the same, I use PCA to reduce vectors to 1.000 features and UMAP or t-SNE (usually try both) to get a 2d vector I can scatter plot. I have tried to use UMAP for the full process but it doesn't escalate well enough. Is this a good approach?
Edit: I have added an image of the best embedding I have found until now as a reference
Submitted by zanzagaes2 t3_10xt36j in MachineLearning
zanzagaes2 OP t1_j7w5sr1 wrote
Reply to comment by lonelyrascal in [P] Creating an embedding from a CNN by zanzagaes2
I will try encoder-decoder architecture, mainly to try to improve the embedding. Right now asymptotics of PCA have not proven a problem, sklearn implementation performs PCA on ~1.000 features vectors almost immediately.
Do you have any reference on any encoder-decoder architecture I can use?