Takadeshi t1_itofgcb wrote on October 25, 2022 at 3:14 AM

Being able to generate cohesive video? Probably 3 years or less, honestly. But a movie with its own music, a coherent plot, acting e.t.c? Seems a long way off to me; at that point you basically have an LLM which is a better writer, director, actor and musician than the majority of humans. I think for that you're probably going to need something which is near-human level intelligence, and you're also going to need a system that works for both language, visual and audio data, which is something outside of the scope of LLMs. Maybe you could make a "writer-bot" that writes the story, then a "video bot" that makes video from a long text input (the size of inputs is also another limitation of LLMs rn, so it would be difficult to plug a whole movie script into a model and expect good results), then an "audio bot" that takes a video and composes suitable music for parts of the movie that make sense.