airduster_9000 t1_j2mg9rc wrote on January 2, 2023 at 12:16 PM

Reply to comment by Ortus14 in Why can artificial intelligences currently only learn one type of thing? by ItsTimeToFinishThis

People assume that GPT4 might be multi-modal- and be able to handle more than juts text. Since its OpenAI - combining GPT, CLIP and Dall-E at some point seems given.

Akimbo333 t1_j2mimns wrote on January 2, 2023 at 12:43 PM

Wow that's cool! What is CLIP?

airduster_9000 t1_j2mrgtl wrote on January 2, 2023 at 2:11 PM

CLIP is the eyes that let it see images - not just read text and symbols.

GPT = Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. Given an initial text as prompt, it will produce text that continues the prompt.

CHATGPT = Special trained version of GPT3.5 for chat.

DALL-E = DALL-E (stylized as DALL·E) and DALL-E 2 are deep learning models developed to generate digital images from natural language descriptions, called "prompts".

CLIP = CLIP does the opposite of DALL-E: it creates a text-description for a given image. Read more here: https://openai.com/blog/clip/

Akimbo333 t1_j2mt3d5 wrote on January 2, 2023 at 2:24 PM

Cool thanks for the info!!!