zanzagaes2 OP t1_j7w5sr1 wrote on February 9, 2023 at 8:52 PM

Reply to comment by lonelyrascal in [P] Creating an embedding from a CNN by zanzagaes2

I will try encoder-decoder architecture, mainly to try to improve the embedding. Right now asymptotics of PCA have not proven a problem, sklearn implementation performs PCA on ~1.000 features vectors almost immediately.

Do you have any reference on any encoder-decoder architecture I can use?

zanzagaes2 OP t1_j7vpols wrote on February 9, 2023 at 7:15 PM

Reply to comment by schludy in [P] Creating an embedding from a CNN by zanzagaes2

You are right, both tSNE and UMAP documentation recommend going to 30-50 features before using them. In this case the result is quite similar to the one I found, though.

zanzagaes2 OP t1_j7vpd89 wrote on February 9, 2023 at 7:13 PM

Reply to comment by schludy in [P] Creating an embedding from a CNN by zanzagaes2

Yes, I think that's the case because I am getting far more reasonable values comparing the projection to 2d/3d of the embedding rather than the full 500 feature vector.

Is there a better way to do this than projecting into a smaller space (using reduction dimensionality techniques or encoder-decoder approach) and using L2 there?

zanzagaes2 OP t1_j7uual3 wrote on February 9, 2023 at 3:58 PM

Reply to comment by Tober447 in [P] Creating an embedding from a CNN by zanzagaes2

Yes, that's a great idea. I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?

Your perspective has been really helpful, thank you

zanzagaes2 OP t1_j7unuq2 wrote on February 9, 2023 at 3:16 PM

Reply to comment by Tober447 in [P] Creating an embedding from a CNN by zanzagaes2

May I use some part of the trained model to avoid retraining from scratch? The current model has very decent precision and I have generated some other visualizations for it (like heatmaps) so doing work around this model would be very convenient.

Edit: I have added an image of the best embedding I have found until now as a reference

zanzagaes2 OP t1_j7unjuw wrote on February 9, 2023 at 3:14 PM

Reply to comment by schludy in [P] Creating an embedding from a CNN by zanzagaes2

I have not found a very convincing embedding yet, I have tried several that go from ~500 features (class activation map) to ~20.000 features (output of last convolutional layer before pooling), all generated from the full training set (~30.000 samples)

In all cases I do the same, I use PCA to reduce vectors to 1.000 features and UMAP or t-SNE (usually try both) to get a 2d vector I can scatter plot. I have tried to use UMAP for the full process but it doesn't escalate well enough. Is this a good approach?

Edit: I have added an image of the best embedding I have found until now as a reference