MysteryInc152 t1_jcdthob wrote on March 16, 2023 at 2:41 AM

Are you using Gpt-Vision ? Or are there separate assortments of visual foundation models ?

Empty-Revolution7570 OP t1_jcdtuff wrote on March 16, 2023 at 2:43 AM

Yes, I included all the VFMs. I added upon those a few more, such as OpenAI Whisper. Still exploring how to incorporate video models

MysteryInc152 t1_jcduvhn wrote on March 16, 2023 at 2:51 AM

I'm sorry maybe I want clear but you obviously have API access to GPT-4 right ? Does this access include an API call to their Vision model ? Or are you sending the images straight to BLIP and the like.

Empty-Revolution7570 OP t1_jcdv1nt wrote on March 16, 2023 at 2:53 AM

No, it understands image through other models on hugging face, and outputs image with diffusers or OpenAI dalle