Viewing a single comment thread. View all comments

JClub t1_jabyh73 wrote

GPT was never trained with image data, why is this a fair comparison? The UnifiedQA model is from 2022, so it doesn't seem fair either. Why don't we have some comparisons with other SOTA multimodal models? Such as OFA or UniT

1