Submitted by flowday t3_10gxy2t in singularity
genshiryoku t1_j55pz1h wrote
Scaling up transformer models like GPT aren't going to result in AGI. Almost every AI expert including the researchers working at OpenAI agree with this.
We need a new architecture, one with both short term and long term memory, multi-modality and less need for training data for us to reach AGI.
The current path of scaling up transformer models will stagnate at GPT-4 or GPT-5 because we simply don't have enough data on the collective internet for us to keep scaling it further than that.
MrEloi t1_j55svcd wrote
>we simply don't have enough data on the collective internet for us to keep scaling it further than that.
Why do we need more data? We already have a lot.
We now need to work on the run-time aspects more e.g. short and long term memories etc.
ElvinRath t1_j56gy3g wrote
Either we need new architectures or more data.
​
Right now, even if somehow we put youtube into text, wich could be done, there is just not enought data to efficiently train a 1T parameters model.
And just in text form, there is probably not enought even for 300B....
​
So, yeah, there is no enought data. to keep scaling up
​
It Might be different with multimodal, I don't know about that.
genshiryoku t1_j55te6y wrote
Because GPT-3 was trained on almost all publicly available data and GPT-4 will be trained by transcribing all video footage on the internet and feeding it to it.
You can't scale the model up without scaling the training data with it. The bottleneck is the training data and we're running out of it.
It's not like the internet is suddenly going to 10x in size over the next couple of years. Especially as the global population is shrinking and most people are already connected online so not a lot of new data is made.
Surur t1_j566uy8 wrote
The next step is real-time experiential data, from live video cameras, robot bodies, self-driving cars.
genshiryoku t1_j568vwe wrote
That's not a whole lot of data and doesn't compare to the gargantuan amount of data already on the decade generated over decades.
The current transformer model scaling will hit a wall soon due to lack of training data.
Clawz114 t1_j56nnoj wrote
>Because GPT-3 was trained on almost all publicly available data
GPT-3 was trained with around 45TB of data, which is only around 10% of the common crawl database that makes up 60% of GPT3's training dataset.
>Especially as the global population is shrinking and most people are already connected online so not a lot of new data is made.
The global population is growing and expected to continue growing until just over the 10 billion mark?
Gohoyo t1_j57azvh wrote
> It's not like the internet is suddenly going to 10x in size over the next couple of years. Especially as the global population is shrinking and most people are already connected online so not a lot of new data is made.
I don't get this. Can't AI generate more data for itself in like, a year, than all human communications since the dawn of the internet? Why would the internet need to 10x in size if the population gets a hold of AI that increases the amount of content generated by x1000? Seems like you just need an AI that generates a fuck ton of content and then another one that determines what in that content is "quality". I am totally ignorant here, I just find the 'running out of data' thing quite strange.
genshiryoku t1_j57bbc9 wrote
You can't use AI generated data to train AI as essentially they are already from their dataset. Training with synthetic data like that is called "overfitting" and reduces the performance and effectiveness of the AI.
Gohoyo t1_j57c8yb wrote
Does this mean it only learns from novel information it takes in? As in it can never learn anything about cat conversations after the 10th conversation it reads about a cat? I mean what's the difference between it reading about something it made versus reading someone a person wrote that says something similar? I just can't figure out how you can't get around this by using AI somehow.
Like: AI A makes a billion terabytes of content.
AI B takes in content and makes it 'unique/new/special' somehow.
Give it back to AI A or even a new AI C.
genshiryoku t1_j57dtsz wrote
Without going to deep into it. This is a symptom of Transformer models. My argument was why transformer models like GPT can't scale up.
It has to do with the mathematics behind training AI. Essentially for every piece of data the AI refines itself but for copies of data it overcorrects itself which results in inefficiency or worse performance. With synthetic data it kinda acts the same as duplicate data in that it overcorrects and worsens its own performance.
If you are truly interested you can see for yourself here.
And yes AI researchers are looking for models to detect what data is synthetic on the internet because it's inevitable that new data will be machine generated which can't be used to train on. If we fail at that task we might even enter an "AI dark age" where models get worse and worse with time because the internet will be filled with AI generated garbage data that can't be trained on. Which is the worst case scenario.
Gohoyo t1_j57fu2a wrote
Thanks for trying to help me btw.
I watched the video. I can understand why reading it's own data wouldn't work, but I can't understand why having it create a bunch of data and then altering the data, then giving it back to the AI wouldn't. The key here is that we have machines that can create data at super human speeds. There has to be some way to do something with that data to make it useful to the AI again, right?
genshiryoku t1_j57h1fb wrote
The "created data" is merely the AI mixing the training data in such a way that it "creates" something new. If the dataset is big enough this looks amazing and like the AI is actually creative and creating new things but from a mathematics perspective it's still just statistically somewhere in between the data it already has trained on.
Therefor it would be the same as feeding it its own data. To us it seems like completely new, and actually useable data though which is why ChatGPT is so exciting. But for AI training purposes it's useless.
Gohoyo t1_j57hihv wrote
If ChatGPT creates a paragraph, I then take that paragraph and alter it significantly, how is that new never before seen by AI or humans paragraph not new data for the AI?
genshiryoku t1_j57j6s1 wrote
It would be lower quality data but still usable if significantly altered. The question is. Why would you do this instead of just generating real data?
GPT is trained on human language it needs real interaction to learn from like the one we're having right now.
I'm also not saying that this isn't possible. We are AGI level intelligences and we absolutely consumed less data than GPT-3 did over our lifetimes so we know it's possible to reach AGI with relatively little data.
My original argument was merely that it's impossible with current transformer models like GPT and that we need another breakthrough in AI architecture to solve problems like this, not merely scale up current transformer models, because the training data is going to run out over the next couple of years as all of the internet will be used up.
Gohoyo t1_j57jyq4 wrote
> Why would you do this instead of just generating real data?
The idea would be that harnessing the AI's ability to create massive amounts of regurgitated old data quickly and then transmuting it into 'new data' somehow is faster than acquiring real data.
I mean I believe you, I'm not in this field nor a genius, so if the top AI people are seeing it as a problem then I have to assume it really is, I just don't understand it fully.
docamazing t1_j59aaud wrote
I think you are incorrect here.
Baturinsky t1_j5697wk wrote
I think AI will train from people using it
genshiryoku t1_j56btvq wrote
The problem is the total amount of data and the quality of the data. Humans using an AI like GPT-3 doesn't generate nearly enough data to properly train a new model, not even with decades of interaction.
The demand for training data scales logarithmically with the parameter scale of the transformer model. This essentially means that mathematically Transformer models are a losing strategy and isn't going to lead to AGI unless you had unlimited amount of training data, which we don't.
We need a different architecture.
[deleted] t1_j56o68x wrote
[deleted]
Viewing a single comment thread. View all comments