Submitted by Vegetable-Skill-9700 t3_121a8p4 in MachineLearning
currentscurrents t1_jdmzphs wrote
Reply to comment by gamerx88 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
That's true, but only for the given compute budget used in training.
Right now we're really limited by compute power, while training data is cheap. Chinchilla and LLaMA are intentionally trading more data for less compute. Larger models still perform better than smaller ones given the same amount of data.
In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.
gamerx88 t1_jdn1dd3 wrote
> In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.
I agree but I think data is already a limiting factor today, with the largest (that is public knowledge) models at 175B. The data used to train these models supposedly already cover a majority of the open internet.
[deleted] t1_jdnw01y wrote
[deleted]
PilotThen t1_jdppmpl wrote
There's also the point that they optimise for computer power at training time.
In mass deployment computer power at inference time starts to matter.
Viewing a single comment thread. View all comments