Empty-Revolution7570 OP t1_jcdtuff wrote
Reply to comment by MysteryInc152 in [P] Multimedia GPT: Can ChatGPT/GPT-4 be used for vision / audio tasks just by prompt engineering? by Empty-Revolution7570
Yes, I included all the VFMs. I added upon those a few more, such as OpenAI Whisper. Still exploring how to incorporate video models
MysteryInc152 t1_jcduvhn wrote
I'm sorry maybe I want clear but you obviously have API access to GPT-4 right ? Does this access include an API call to their Vision model ? Or are you sending the images straight to BLIP and the like.
Empty-Revolution7570 OP t1_jcdv1nt wrote
No, it understands image through other models on hugging face, and outputs image with diffusers or OpenAI dalle
Viewing a single comment thread. View all comments