Viewing a single comment thread. View all comments

red75prime t1_is10a00 wrote

Superficially similar maybe. There are real technical reasons why you can get a pretty picture using the existing technology, but cannot employ the same technology to analyze a small codebase (say, 10000 lines of code).

With no operating memory other than its input buffer a transformer model is limited in the amount of information it can attend to.

For pictures it's fine. You can describe a complex scene in a hundred or two of words. For code synthesis that is doing more than producing a code snippet you need to analyze thousands and millions of words (most of them are skipped, but you still need to attend to them even if briefly).

And here the limitation of transformers come into play. You cannot grow the size of input buffer too much, because required computations grow quadratically (no, not exponentially, but quadratic is enough when you need to run a supercomputer for months to train the network).

Yes, there are various attempts to overcome that, but it's not yet certain that any of them is the path forward. I'd give maybe 3% on something groundbreaking appearing in the next year.

1