ChronoPsyche t1_itfdeef wrote on October 23, 2022 at 6:02 AM

Reply to comment by LittleTimmyTheFifth5 in Given the exponential rate of improvement to prompt based image/video generation, in how many years do you think we'll see entire movies generated from a prompt? by yea_okay_dude

LLM's cannot write feature length scripts yet. Not even close. They've got a tiny context-window problem they need to sort out first.

xirzon t1_itfkplo wrote on October 23, 2022 at 7:40 AM

The paper "Re3: Generating Longer Stories With Recursive Reprompting and Revision" shows some interesting strategies to work around that limitation by imitating aspects of a systematic human writing process to keep a story consistent, detect errors, etc.: https://arxiv.org/abs/2210.06774

A similar approach is taken by the Dramatron system to create screenplays and theatre scripts: https://arxiv.org/abs/2210.06774

In combination with more systematic improvements to LLM architecture you hint at and next-gen models, we might see coherent storytelling sooner than expected (with perhaps full length graphic novels as the first visual artform).

ChronoPsyche t1_itflq78 wrote on October 23, 2022 at 7:54 AM

Oh there are certainly workarounds! I agree 100%. These workarounds are just that though, workarounds. We won't be able to leverage the full power of long-form content generation until we solve the memory issues.

Which is fine. There is still so many more advances that can be made in the space of the current limitations we have.

visarga t1_itgqug0 wrote on October 23, 2022 at 3:08 PM

There is also exponentially less long-form content than short form. The longer it gets, the fewer samples we have to train on.

LittleTimmyTheFifth5 t1_itfdvcd wrote on October 23, 2022 at 6:08 AM

That's a shame. Though I wonder how long it will be till that's not a problem anymore.

visarga t1_itgqoj0 wrote on October 23, 2022 at 3:07 PM

There are workarounds for long input, one is the linear transformer family (Linformer, Longformer, Big Bird, Performer, etc), the other is the Perceiver, who can reference a long input sequence using a fixed size transformer.