Simusid OP t1_jbt91tb wrote on March 11, 2023 at 3:16 PM

My main goal was to just visualize the embeddings to see if they are grossly different. They are not. That is just a qualitative view. My second goal was to use the embeddings with a trivial supervised classifier. The dataset is labeled with four labels. So I made a generic network to see if there was any consistency in the training. And regardless of hyperparameters, the OpenAI embeddings seemed to always outperform the SentenceTransformer embeddings, slightly but consistency.

This was not meant to be rigorous. I did this to get a general feel of the quality of the embeddings, plus to get a little experience with the OpenAI API.

quitenominal t1_jbtr6g7 wrote on March 11, 2023 at 5:24 PM

fwiw this has also been my finding when comparing these two embeddings for classification tasks. Better, but not enough to justify the cost

polandtown t1_jbu2zqe wrote on March 11, 2023 at 6:47 PM

Learning here, but how are you axes defined? Some kind of factor(s) or component(s) extracted from each individual embedding? Thanks for the visualization, as it made me curious and interested! Good work!

Simusid OP t1_jbu3q8m wrote on March 11, 2023 at 6:52 PM

Here is some explanation about UMAP axes and why they should usually be ignored: https://stats.stackexchange.com/questions/527235/how-to-interpret-axis-of-umap

Basically it's because they are nonlinear.

onkus t1_jbwftny wrote on March 12, 2023 at 6:21 AM

Doesn’t this also make it essentially impossible to compare the two figures you’ve shown?

Thog78 t1_jbyh4w1 wrote on March 12, 2023 at 6:24 PM

What you're looking for when comparing UMAPs is if the local relationships are the same. Try to recognize clusters and see their neighbors, or whether they are distinct or not. A much finer colored clustering based on another reduction (typically PCA) helps with that. Without clustering, you can only try to recognize landmarks from their size and shape.

[deleted] t1_jbyaq18 wrote on March 12, 2023 at 5:40 PM

[deleted]

polandtown t1_jbu56lb wrote on March 11, 2023 at 7:02 PM

Thanks!

[deleted] t1_jbtcsig wrote on March 11, 2023 at 3:43 PM