Viewing a single comment thread. View all comments

DungeonsAndDradis t1_iragf3p wrote

This recent* explosion of AI progress was made possible by the Transformer architecture (and the paper Attention is All you Need: https://arxiv.org/abs/1706.03762). I think we're approaching the limits of that.

Companies have stopped trying to go bigger and better with data and parameters, and are now trying to streamline training to be more efficient with less. I believe GPT-4 is only going to be trained on 10% of the data or something like that. But it is still expected to be a significant improvement over GPT-3.

I assume that the next "big thing" in AI is what will kick us into high gear towards AGI.

Some researcher, in some lab in Silicon Valley or academia, is probably writing and revising the research paper now on "big thing 2.0". It will probably be called something like "Self-training and recursion of knowledge".

*since 2017

3

Effective-Dig8734 t1_irbx4o1 wrote

I think what you’re thinking of is that gpt 4 will have a similiar number of parameters, but it will be trained on Farrr more data

3

MasterFubar t1_irajhs5 wrote

An interesting thing about transformers is that they are simpler than the LSTMs that came before them. Problems like vanishing gradients set limits on how complex a neural network can be.

2