Submitted by ItsTimeToFinishThis t3_1019dd1 in singularity
airduster_9000 t1_j2mg9rc wrote
Reply to comment by Ortus14 in Why can artificial intelligences currently only learn one type of thing? by ItsTimeToFinishThis
People assume that GPT4 might be multi-modal- and be able to handle more than juts text. Since its OpenAI - combining GPT, CLIP and Dall-E at some point seems given.
Akimbo333 t1_j2mimns wrote
Wow that's cool! What is CLIP?
airduster_9000 t1_j2mrgtl wrote
CLIP is the eyes that let it see images - not just read text and symbols.
​
GPT = Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt.
CHATGPT = Special trained version of GPT3.5 for chat.
DALL-E = DALL-E (stylized as DALL·E) and DALL-E 2 are deep learning models developed to generate digital images from natural language descriptions, called "prompts".
CLIP = CLIP does the opposite of DALL-E: it creates a text-description for a given image. Read more here: https://openai.com/blog/clip/
Akimbo333 t1_j2mt3d5 wrote
Cool thanks for the info!!!
Viewing a single comment thread. View all comments