quitenominal t1_jdw15ao wrote on March 27, 2023 at 4:42 PM

Reply to comment by esquire900 in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

It's in the terms that you can't use data generated through OpenAI to compete with OpenAI - and I believe they'd be able to argue competition were the trained model to be used commercially.

See section 2.C.iii of https://openai.com/policies/terms-of-use

quitenominal t1_jbtr6g7 wrote on March 11, 2023 at 5:24 PM

Reply to comment by Simusid in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

fwiw this has also been my finding when comparing these two embeddings for classification tasks. Better, but not enough to justify the cost

quitenominal t1_jbtqio0 wrote on March 11, 2023 at 5:19 PM

Reply to comment by Simusid in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

Nice explainer! I think this is good for those with some linear algebra familiarity. I added a further explanation going one level more simple again

quitenominal t1_jbtptri wrote on March 11, 2023 at 5:14 PM

Reply to comment by deliciously_methodic in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

An embedding is a numerical representation of some data. In this case the data is text.

These representations (read list of numbers) can be learned with some goal in mind. Usually you want the embeddings of similar data to be close to one another, and the embeddings of disparate data to be far.

Often these lists of numbers representing the data are very long - I think the ones from the model above are 768 numbers. So each piece of text is transformed into a list of 768 numbers, and similar text will get similar lists of numbers.

What's being visualized above is a 2 number summary of those 768. This is referred to as a projection, like how a 3D wireframe casts a 2D shadow. This lets us visualize the embeddings and can give a qualitative assessment of their 'goodness' - a.k.a are they grouping things as I expect? (Similar texts are close, disparate texts are far)