Viewing a single comment thread. View all comments

currentscurrents t1_jdft0hp wrote

>They seem to refer to this model as text-only, contradicting to the known fact that GPT-4 is multi-modal.

I noticed this in the original paper as well.

This probably means that they implemented multimodality the same way Palm-E did; starting with a pretrained LLM.

57

was_der_Fall_ist t1_jdgmd2t wrote

As far as I understand, that’s exactly what they did. That’s why the public version of GPT-4 is text-only so far. The vision part came after.

16

JohnFatherJohn t1_jdje7cl wrote

Perhaps they're saying that because it can only output text. Multimodality is limited to images + text as inputs.

2

SatoshiNotMe t1_jdke4cu wrote

How do you input images to GPT4? Via the API?

1

JohnFatherJohn t1_jdkik7r wrote

It's not available to the public yet, restricted to specific groups that are conducting research.

1