Llama proved that GPT-3-3.5 level performance can be squeezed out of relatively wimpy consumer hardware. But GPT-4 is much bigger than GPT-3, so it seems like even optimizing it by orders of magnitude might not be enough to achieve similar results. Is it plausible to expect GPT-4 level performance from consumer hardware in the near future?

Comments

You must log in or register to comment.

UseNew5079 t1_je6lgyb wrote on March 29, 2023 at 7:52 PM

#2,458,730

Maybe 7b model can get GPT-4 level performance if trained for _very_ long. Facebook paper showed that performance increased until the end of training and it looks like there was no plateau. Maybe it's just very inefficient but possible? Or maybe there is another way.

sumane12 t1_je6m9d2 wrote on March 29, 2023 at 7:57 PM

#2,458,885

The human brain proves there's a hell of a way to go.

ExposingMyActions t1_je7f44p wrote on March 29, 2023 at 11:16 PM

#2,465,256

Replying to sumane12 (#2,458,885)

Yup

Scarlet_pot2 t1_je9aq1r wrote on March 30, 2023 at 10:48 AM

#2,479,236

let's find out. train a small model and fine-tune it on gpt-3 / 3.5 / 4

Akimbo333 t1_je9proo wrote on March 30, 2023 at 1:12 PM

#2,482,783

Replying to UseNew5079 (#2,458,730)

Why does performance increase with training instead of parameters?

UseNew5079 t1_je9wrw6 wrote on March 30, 2023 at 2:05 PM

#2,484,726

Replying to Akimbo333 (#2,482,783)

Check LLama paper: https://arxiv.org/pdf/2302.13971.pdf

Specifically this graph: https://paste.pics/6f817f0aa71065e155027d313d70f18c

They increase performance (reduce loss) with parameters or training time. More parameters just allow for faster and deeper initial drop in error/loss but later part looks the same for all model sizes. At least that is my interpretation.