NonFocusNorm

NonFocusNorm t1_itq4vts wrote

I believe robust backbone models are very crucial since they are feature extractors and determine how good your embeddings are. So I suggest using CLIP from openAI, a very OP model that works well for zero-shot learning task. I personally use it and suprisingly outperform others in an text-image retrieval task, highly recommend you try it out.

3