currentscurrents t1_jdmzphs wrote on March 25, 2023 at 4:39 PM

Reply to comment by gamerx88 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

That's true, but only for the given compute budget used in training.

Right now we're really limited by compute power, while training data is cheap. Chinchilla and LLaMA are intentionally trading more data for less compute. Larger models still perform better than smaller ones given the same amount of data.

In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.

gamerx88 t1_jdn1dd3 wrote on March 25, 2023 at 4:51 PM

> In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.

I agree but I think data is already a limiting factor today, with the largest (that is public knowledge) models at 175B. The data used to train these models supposedly already cover a majority of the open internet.

[deleted] t1_jdnw01y wrote on March 25, 2023 at 8:29 PM

[deleted]

PilotThen t1_jdppmpl wrote on March 26, 2023 at 5:27 AM

There's also the point that they optimise for computer power at training time.

In mass deployment computer power at inference time starts to matter.