curiousshortguy t1_j6ak1cj wrote
Why are you using euclidiean distance? Use cosine distances. The former cares about vector magnitue, the latter doesn't. As a general rule of thumb for comparing vector embeddings, you don't care about magnitude, at best, that typically captures document length.
Do you have more than product titles, such as product descriptions? Where do you get the user queries from? Do you use a default tokenizer for BERT?
lonelyrascal OP t1_j6odjl4 wrote
I have product brand, type and color other than titles. Yes I'll try cosine distances next. User queries are just tests done by me. Because there's no other way around except for A/B testing. Thank you.
curiousshortguy t1_j6oflky wrote
How do you use these other features? Do you just vectorize and sum the vectors? Or do you do something else?
I think you can leverage data from current production to create a labeled test dataset.
Viewing a single comment thread. View all comments