manOnPavementWaving

manOnPavementWaving t1_j25doir wrote

Building on YEARS of ideas. They were cool, but without transformers they wouldn't exist. Without infrastructure code, they wouldn't exist. Without years of hardware improvements, they wouldn't exist. Without the ideas of normalization and skip connections, they wouldn't exist. Etc. (and this isn't even including all the alleys that were chased down, to find out they didn't work. Which isn't as clear, but definitely contributes to research).

GATO didn't even have that much to show for it, the long hoped-for skill transfer was not really there. DALLE 2 builds on CLIP and diffusion, ChatGPT builds on GPT3 and years of RL research.

You're saying something along the lines of "x is better than what came before, so the step to x is bigger than the sum of all the steps before that" and that is the worst take i've ever heard. It's definitely not how research works.

And goddamn it I'm getting deja vu cuz this bad take has been said before on this subreddit.

This rebuttal better? I'd be happy to go and list essential moments in AI in the past decade if it isn't.

7

manOnPavementWaving t1_j04bxs9 wrote

I agree that you can't extrapolate, but it's definitely not the case that GPT4 has to have the same limitations as GPT2 and GPT3. Context window issues can be resolved in a myriad of ways (my current fav being this one and retrieval based methods could solve most of the factuality issues (and are very effective and cheap models, as proven by RETRO).

So I want to re-emphasize that we have no clue how good it will be. It could very well smash previous barriers, but it could also be rather disappointing and very much alike ChatGPT. We just don't know.

5

manOnPavementWaving t1_iw0c7h3 wrote

Most news around GPT-4 is. The "Cerebras partnership" was always just a mention of GPT-N by Cerebras to hype up their processor, OpenAI had no say in that. (also not sure if .e6 is 100k-1M or 1M-10M). The only leak that Im sure came from Sam was "the model won't be much bigger than GPT-3 and be text only", which Id say is the only one to trust (although it can be outdated).

12

manOnPavementWaving t1_ityolvz wrote

They actually do invent tools, but that's not the important thing. What made humans intelligent is having a big brain, and having lots of time. If we were to put a newborn and a baby chimpanzee in a jungle and monitor them, they wouldn't seem all that different regarding intelligence.

Fine if you take that into your calculations, but it can't be attributed to just the bigger brain. Problem being, the 100 trillion parameter model won't have hundreds of thousands of years, and billions of copies of itself.

Cool reference, though! Interesting work

1

manOnPavementWaving t1_itt6vrn wrote

That wasn't 1 year before the prediction of a hundred billion parameters though. Im not doubting that they'll come, im doubting the timeline.

Interested in why you think a 10 trillion parameter would be human level AGI.

3

manOnPavementWaving t1_itt06eo wrote

With H100 the training time optimistically only improves a factor of 9. Not nearly enough to breach the 200x gap between the current largest model and 100 trillion parameter model, and thats in parameter scaling alone, ignoring data scaling. PaLM training took 1200 hours on 6144 tpu v4 chips, and an additional 336 hours on 3072 tpu v4 chips. A 100 trillion parameter model would literally be too big to train before the year 2023 comes to an end.

3

manOnPavementWaving t1_itsz25o wrote

Wowowow you're seriously questioning the scaling laws of deepmind and going back to the OpenAI ones, which have been demonstrated to be false?

Chain of thought prompting, self consistency, reinforcement learning from human feedback, and data scaling, that's been driving LLM performance lately, noticeably more than scale has. (whilst being significantly cheaper).

Why do you expect such a jump when the industry has been stuck at half a trillion for the past year? All previous jumps were smaller and cost significantly less.

8

manOnPavementWaving t1_itsn0zt wrote

Its actually already stopping, the engineering challenges are getting too big (trends predict 5-10 trillion parameter dense models by now, bet your ass they don't exist), the data available is getting too few, and the other ways to increase performance are way too easy and way too cheap to not focus on.

4