loopuleasa t1_jdhrwkv wrote on March 24, 2023 at 2:18 PM

GPT4 is not publicly multimodal though

farmingvillein t1_jdhua51 wrote on March 24, 2023 at 2:34 PM

Hmm, what do you mean by "publicly"? OpenAI has publicly stated that GPT-4 is multi-modal, and that they simply haven't exposed the image API yet.

The image API isn't publicly available yet, but it is clearly coming.

loopuleasa t1_jdhuit0 wrote on March 24, 2023 at 2:36 PM

talking about consumer access to the image API

is tricky, as the system is swamped already with text

they mentioned an image takes 30 seconds to "comprehend" by the model...

MysteryInc152 t1_jdj8x5e wrote on March 24, 2023 at 7:59 PM

>they mentioned an image takes 30 seconds to "comprehend" by the model...

wait really ? Cn you link source or something. There's no reason a native implementation should take that long.

Now i'm wondering if they're just doing something like this -https://github.com/microsoft/MM-REACT

yashdes t1_jdij1tl wrote on March 24, 2023 at 5:12 PM

these models are very sparse, meaning very few of the actual calculations actually effect the output. My guess is trimming the model is how they got gpt3.5-turbo and I wouldn't be surprised if gpt4-turbo is coming.

farmingvillein t1_jdj9w98 wrote on March 24, 2023 at 8:05 PM

> these models are very sparse

Hmm, do you have any sources for this assertion?

It isn't entirely unreasonable, but 1) GPU speed-ups for sparsity aren't that high (unless OpenAI is doing something crazy secret/special...possible?), so this isn't actually that big of an upswing (unless we're including MoE?) and 2) openai hasn't released architecture details (beyond the original gpt3 paper--which did not indicate that the model was "very" sparse).

SatoshiNotMe t1_jdkd8l5 wrote on March 25, 2023 at 12:46 AM

I’m curious about this as well. I see it’s multimodal but how do I use it with images? The ChatGPTplus interface clearly does not handle images. Does the API handle image?

farmingvillein t1_jdkdjye wrote on March 25, 2023 at 12:48 AM

> I see it’s multimodal but how do I use it with images?

You unfortunately can't right now--the image handling is not publicly available, although supposedly the model is capable.