Submitted by Not-Banksy t3_126a1dm in singularity
Zermelane t1_je8lss0 wrote
Reply to comment by FlyingCockAndBalls in When people refer to “training” an AI, what does that actually mean? by Not-Banksy
Better parallelism in training, and a more direct way to reference past information, than in RNNs (recurrent neural networks) which seemed like the "obvious" way to process text before transformers came by.
These days we have RNN architectures that can achieve transformer-like training parallelism, the most interesting-looking one being RWKV. They are still badly disadvantaged when needing information directly from the past, for instance to repeat a name that's been mentioned before, but they have other advantages, and their performance gets close enough to transformers that it could be just a question of scaling exponents which architecture ends up winning out.
Viewing a single comment thread. View all comments