VirtualHat t1_j6bi3xk wrote on January 29, 2023 at 3:36 AM

Reply to comment by visarga in [N] OpenAI has 1000s of contractors to fine-tune codex by yazriel0

Video and audio might be the next frontier. Although, I'm not too sure how useful it would be. Youtube receives over 500 hours of uploads per minute, providing an essentially unlimited pipe of training data.

luaks1337 t1_j6chxhv wrote on January 29, 2023 at 10:18 AM

Also spoken words differ a lot from thoughtful written text. Training on the 1:1 transcription would yield bad results in terms of grammar and readability. They could solve this by using a GPT model to rewrite the transcription but then you're training AI on AI which could lead to bias.

VirtualHat t1_j6ckblf wrote on January 29, 2023 at 10:51 AM

I was thinking next frame prediction, perhaps conditioned on the text description or maybe a transcript. The idea is you could then use the model to generate a video from a text prompt.

I suspect this is far too difficult to achieve with current algorithms. It's just interesting that the training data is all there, and would be many, many orders of magnitude larger than GPT-3's training set.

luaks1337 t1_j6clz9v wrote on January 29, 2023 at 11:14 AM

Ah, I thought you meant that video and audio would be the next step for text mining.

I believe OpenAI confirmed that they already work on a text to video model. My guess would be that current algorithms could do that but that it would be far to expensive to train on videos.