Comments

You must log in or register to comment.

marcingrzegzhik t1_j6aic0m wrote

If you are looking for product-query similarity, you could try using a Word2Vec model. You can train a Word2Vec model on your dataset, and then use the model to find the most similar words for each product title and user query. This should give you a better understanding of the similarity between the two.

You can also try using an embedding-based approach, such as using an embedding layer in a neural network. This would enable you to learn more complex relationships between product titles and user queries.

You could also try using a matrix factorization technique such as Singular Value Decomposition (SVD) or Non-Negative Matrix Factorization (NMF). These methods can help you to identify latent features in your dataset, which can be used to generate better recommendations.

Hope this helps!

2

curiousshortguy t1_j6ajise wrote

> You can also try using an embedding-based approach, such as using an embedding layer in a neural network. This would enable you to learn more complex relationships between product titles and user queries.

He already is doing that using BERT.

2

curiousshortguy t1_j6ak1cj wrote

Why are you using euclidiean distance? Use cosine distances. The former cares about vector magnitue, the latter doesn't. As a general rule of thumb for comparing vector embeddings, you don't care about magnitude, at best, that typically captures document length.

Do you have more than product titles, such as product descriptions? Where do you get the user queries from? Do you use a default tokenizer for BERT?

3