Submitted by zanzagaes2 t3_10xt36j in MachineLearning

Hi all: I have trained a CNN (efficietnet-b3) to classify the degree of a disease on medical images. I would like to create an embedding both to visualize relationships between images (after projecting to 2d or 3d-space) and to find similar images to one given.

I have tried using the output of the last convolution both before and after pooling for all train images (~30.000) but the result is mediocre: images non-alike are quite close in the embedding and plotting it in 2 or 3d just show a point cloud with no obvious pattern.

I have also tried to use the class activation map (the output of the convolutional layer after pooling and multiplying by the weights of the classifier of the predicted class). This is quite better, but class are not separated too clearly in the scatterplot.

Is there any other sensible way to generate the embeddings? I have tried using the hidden representation of earlier convolutional layers, but some of them are so huge (~650.000 features per sample) creating a reasonable sized embedding would require very aggressive PCA.

​

Example of the scatter plot of the heatmap embedding. While it is okayish (classes are more or less spatially localized) it would be great to find an embedding that creates more visible clusters for each class.

https://preview.redd.it/l7smdyuml6ha1.png?width=543&format=png&auto=webp&v=enabled&s=1c9a872ff73eea199e4977a1375303bcffe00158

​

1

Comments

You must log in or register to comment.

Tober447 t1_j7u90qp wrote

You could try an autoencoder with CNN layers and a bottleneck of 2 or 3 neurons to be able to visualize these embeddings. The autoencoder can be interpreted as non-linear PCA.

​

Also, similarity in this embedding space should correlate with similarity of the real images/whatever your CNN extracts from the real images.

5

schludy t1_j7ula73 wrote

How do you plot the embeddings in 2D exactly? What is the size of the embeddings that you're trying to visualize?

1

zanzagaes2 OP t1_j7unjuw wrote

I have not found a very convincing embedding yet, I have tried several that go from ~500 features (class activation map) to ~20.000 features (output of last convolutional layer before pooling), all generated from the full training set (~30.000 samples)

In all cases I do the same, I use PCA to reduce vectors to 1.000 features and UMAP or t-SNE (usually try both) to get a 2d vector I can scatter plot. I have tried to use UMAP for the full process but it doesn't escalate well enough. Is this a good approach?

Edit: I have added an image of the best embedding I have found until now as a reference

1

zanzagaes2 OP t1_j7unuq2 wrote

May I use some part of the trained model to avoid retraining from scratch? The current model has very decent precision and I have generated some other visualizations for it (like heatmaps) so doing work around this model would be very convenient.

Edit: I have added an image of the best embedding I have found until now as a reference

1

Tober447 t1_j7uq41s wrote

You would take the output of a layer of your choice from the trained cnn (as you do now) and feed it into a new model, that is the autoencoder. So yes, the weights from your model are kept, but you will have to train the autoencoder from scratch. Something like CNN (only inference, no backprop) --> Decoder --> Latent Space --> Encoder for training and during inference you take the output of the decoder and use it for visualization or similarity.

4

zanzagaes2 OP t1_j7uual3 wrote

Yes, that's a great idea. I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?

Your perspective has been really helpful, thank you

2

Tober447 t1_j7uyy1n wrote

>I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?

Exactly. :-)

1

schludy t1_j7v11vj wrote

The individual steps sound ok, however, if you project 20.000 to 2D, the results you got look very reasonable. I'm not sure about UMAP, but I think for tSNE, it's recommended to have low dimensionality, something more in the order of 32 features. I would probably try to adjust the architecture, as other comments have suggested

1

schludy t1_j7v9pkm wrote

I think you're underestimating the curse of dimensionality. In 500d, most vectors will be far away from each other. You can't just use L2 norm when comparing the vectors in that high dimensional space

2

zanzagaes2 OP t1_j7vpd89 wrote

Yes, I think that's the case because I am getting far more reasonable values comparing the projection to 2d/3d of the embedding rather than the full 500 feature vector.

Is there a better way to do this than projecting into a smaller space (using reduction dimensionality techniques or encoder-decoder approach) and using L2 there?

1

zanzagaes2 OP t1_j7vpols wrote

You are right, both tSNE and UMAP documentation recommend going to 30-50 features before using them. In this case the result is quite similar to the one I found, though.

1

lonelyrascal t1_j7vwy0c wrote

PCA has O(n^3) time complexity. Instead of doing that, why don't you pass the embedding through an autoencoder?

1

zanzagaes2 OP t1_j7w5sr1 wrote

I will try encoder-decoder architecture, mainly to try to improve the embedding. Right now asymptotics of PCA have not proven a problem, sklearn implementation performs PCA on ~1.000 features vectors almost immediately.

Do you have any reference on any encoder-decoder architecture I can use?

1

mrtransisteur t1_j7xt1e5 wrote

You want to model:

p(cluster =c | img)

p(c1 == c2 | dist(c1, c2) = d, img1 in c1, img2 in c2)

You could try a couple things:

  • Frechet Inception Distance but instead of Inception model you use the medical CNN activations

  • distance metric learning

  • hdbscan/umap/etc for clustering

  • persistent homology based topological data analysis methods for finding clusters

  • masked autoencoders for good feature extraction

  • JEPA style architecture

2