lucidraisin t1_j61h7lf wrote
Reply to comment by currentscurrents in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
and one more paper along same lines! https://arxiv.org/abs/2212.07677
currentscurrents OP t1_j61ndkl wrote
Thanks for the link!
I think it's interesting that they spent so much time in the 90s trying to make meta-learning work, and now it appears emergently just from throwing scale at the problem.
DigThatData t1_j61zv3l wrote
Compute Is All You Need
endless_sea_of_stars t1_j627a9m wrote
Just rent out an AWS region for a month and you'll be good to go. Hold a couple bake sales to defray the cost.
robdogcronin t1_j61zvce wrote
That's the bitter lesson
currentscurrents OP t1_j623hb4 wrote
Yeah, but I want AI now. Not in 40 years when computers are 1000x better.
Also I'm not sure computers will be 1000x better in 40 years, Moore's law isn't what it used to be.
EarthquakeBass t1_j64jhk3 wrote
https://en.m.wikipedia.org/wiki/Huang%27s_law
A bit of marketing flair for sure, but I think at the crossroads of hardware improvements, ensembling, clever optimizations etc. we will keep improving models at a pretty darn fast pace. GPT-3 alone dramatically has improved the productivity of engineers, I’m sure of it.
throwaway2676 t1_j68vbfq wrote
> Not in 40 years when computers are 1000x better.
It won't take anywhere near that long. We've barely scratched the surface of ASICs and analog matrix multiplication, which is where the real fun is going to begin.
Viewing a single comment thread. View all comments