curiousshortguy t1_j6ak1cj wrote on January 28, 2023 at 11:10 PM

Why are you using euclidiean distance? Use cosine distances. The former cares about vector magnitue, the latter doesn't. As a general rule of thumb for comparing vector embeddings, you don't care about magnitude, at best, that typically captures document length.

Do you have more than product titles, such as product descriptions? Where do you get the user queries from? Do you use a default tokenizer for BERT?

lonelyrascal OP t1_j6odjl4 wrote on January 31, 2023 at 7:28 PM

I have product brand, type and color other than titles. Yes I'll try cosine distances next. User queries are just tests done by me. Because there's no other way around except for A/B testing. Thank you.

curiousshortguy t1_j6oflky wrote on January 31, 2023 at 7:41 PM

How do you use these other features? Do you just vectorize and sum the vectors? Or do you do something else?

I think you can leverage data from current production to create a labeled test dataset.