currentscurrents t1_jdft0hp wrote
>They seem to refer to this model as text-only, contradicting to the known fact that GPT-4 is multi-modal.
I noticed this in the original paper as well.
This probably means that they implemented multimodality the same way Palm-E did; starting with a pretrained LLM.
was_der_Fall_ist t1_jdgmd2t wrote
As far as I understand, that’s exactly what they did. That’s why the public version of GPT-4 is text-only so far. The vision part came after.
JohnFatherJohn t1_jdje7cl wrote
Perhaps they're saying that because it can only output text. Multimodality is limited to images + text as inputs.
SatoshiNotMe t1_jdke4cu wrote
How do you input images to GPT4? Via the API?
JohnFatherJohn t1_jdkik7r wrote
It's not available to the public yet, restricted to specific groups that are conducting research.
Viewing a single comment thread. View all comments