_Arsenie_Boca_ OP t1_j7miglb wrote on February 7, 2023 at 9:39 PM

Reply to comment by PassingTumbleweed in [D] Papers that inject embeddings into LMs by _Arsenie_Boca_

Thanks, good pointer. I am particularly interested in the different mechanisms how the embeddings might be integrated into LMs. E.g. in PaLI and SimVLM, the external embeddings (here image encodings) are simply treated as token embeddings. Others use modified attention mechanisms to potentially make better use of the information. Are you aware of a work that directly compares multiple integration mechanisms?

PassingTumbleweed t1_j7mlwls wrote on February 7, 2023 at 10:01 PM

I'm not aware of any comparison. Maybe it doesn't matter that much?

PaLI feeds embeddings from the Vision Transformer to the LM after a linear projection layer. It allows back propagation through ViTs weights so that the image encoding can be learned for the task. The ability to tune the embeddings in end-to-end fashion might be an important consideration.

_Arsenie_Boca_ OP t1_j7ommq8 wrote on February 8, 2023 at 8:17 AM

Yes, seamless joint training is definitely one of the perks. I will look further if I can find anything about the effectiveness of different injection/fusion mechanisms.