ChronoPsyche t1_itfdeef wrote
Reply to comment by LittleTimmyTheFifth5 in Given the exponential rate of improvement to prompt based image/video generation, in how many years do you think we'll see entire movies generated from a prompt? by yea_okay_dude
LLM's cannot write feature length scripts yet. Not even close. They've got a tiny context-window problem they need to sort out first.
xirzon t1_itfkplo wrote
The paper "Re3: Generating Longer Stories With Recursive Reprompting and Revision" shows some interesting strategies to work around that limitation by imitating aspects of a systematic human writing process to keep a story consistent, detect errors, etc.: https://arxiv.org/abs/2210.06774
A similar approach is taken by the Dramatron system to create screenplays and theatre scripts: https://arxiv.org/abs/2210.06774
In combination with more systematic improvements to LLM architecture you hint at and next-gen models, we might see coherent storytelling sooner than expected (with perhaps full length graphic novels as the first visual artform).
ChronoPsyche t1_itflq78 wrote
Oh there are certainly workarounds! I agree 100%. These workarounds are just that though, workarounds. We won't be able to leverage the full power of long-form content generation until we solve the memory issues.
Which is fine. There is still so many more advances that can be made in the space of the current limitations we have.
visarga t1_itgqug0 wrote
There is also exponentially less long-form content than short form. The longer it gets, the fewer samples we have to train on.
LittleTimmyTheFifth5 t1_itfdvcd wrote
That's a shame. Though I wonder how long it will be till that's not a problem anymore.
visarga t1_itgqoj0 wrote
There are workarounds for long input, one is the linear transformer family (Linformer, Longformer, Big Bird, Performer, etc), the other is the Perceiver, who can reference a long input sequence using a fixed size transformer.
Viewing a single comment thread. View all comments