Submitted by _Arsenie_Boca_ t3_10vwm8k in MachineLearning
_Arsenie_Boca_ OP t1_j7miglb wrote
Reply to comment by PassingTumbleweed in [D] Papers that inject embeddings into LMs by _Arsenie_Boca_
Thanks, good pointer. I am particularly interested in the different mechanisms how the embeddings might be integrated into LMs. E.g. in PaLI and SimVLM, the external embeddings (here image encodings) are simply treated as token embeddings. Others use modified attention mechanisms to potentially make better use of the information. Are you aware of a work that directly compares multiple integration mechanisms?
PassingTumbleweed t1_j7mlwls wrote
I'm not aware of any comparison. Maybe it doesn't matter that much?
PaLI feeds embeddings from the Vision Transformer to the LM after a linear projection layer. It allows back propagation through ViTs weights so that the image encoding can be learned for the task. The ability to tune the embeddings in end-to-end fashion might be an important consideration.
_Arsenie_Boca_ OP t1_j7ommq8 wrote
Yes, seamless joint training is definitely one of the perks. I will look further if I can find anything about the effectiveness of different injection/fusion mechanisms.
Viewing a single comment thread. View all comments