Viewing a single comment thread. View all comments

porcenat_k t1_itsrnjb wrote

"trends predict 5-10 trillion parameter dense models by now, bet your ass they don't exist), the data available is getting too few".

I beg to differ. Indeed, we should expect to see 10 to 20 trillion parameter models this year. Based on industry movements, I'm expecting Meta or Open AI to produce such a model by the end of this year, if not Q1 2023. We don't have enough data for chinchilla compute optimal models. Deep mind scaling laws are flawed in a number of fundamental ways. One of which is that as that sample efficiency, generality and intelligence increases in scale. Large vanilla models require less data in order to achieve better performance. We can train multi trillion parameter dense models with the same or better yet, less data that it took to train gpt 3. It is certainly possible with massive compute clusters running on thousands of A100 gpus to train such a model. Which is exactly what is being done right now. Cheap methods are being focused on right now are a temporary crutch which I'm projected will be put away once firms are able to adopt new gpus such as the H100s.

15

manOnPavementWaving t1_itsz25o wrote

Wowowow you're seriously questioning the scaling laws of deepmind and going back to the OpenAI ones, which have been demonstrated to be false?

Chain of thought prompting, self consistency, reinforcement learning from human feedback, and data scaling, that's been driving LLM performance lately, noticeably more than scale has. (whilst being significantly cheaper).

Why do you expect such a jump when the industry has been stuck at half a trillion for the past year? All previous jumps were smaller and cost significantly less.

8

porcenat_k t1_itt4w3g wrote

>Why do you expect such a jump when the industry has been stuck at half a trillion for the past year? All previous jumps were smaller and cost significantly less.

A combination of software and hardware improvements being currently worked on using Nvidia GPUs. https://azure.microsoft.com/en-us/blog/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed/

With regard to Chinchilla, I don't think they disproved anything. See my comment history if you care enough. I've debated quite extensively on this topic.

7

justowen4 t1_itt5mpf wrote

It’s simply going to be both scenarios in 2023, quantity and quality, synthetic data variations from existing corpuses with better training distributions (pseudo-sparcity) on optimized hardware. Maybe even some novel chips like photon or analog later next year. It’s like cpus 20 years ago, optimizations all around!

6