Submitted by mrx-ai t3_121q6nk in MachineLearning
currentscurrents t1_jdn7spo wrote
Reply to comment by pornthrowaway42069l in [N] GPT-4 has 1 trillion parameters by mrx-ai
Bigger models are more sample efficient for a given amount of data.
Scale is a triangle of three factors; model size, data size, and compute size. If you want to make more efficient use of data, you need to increase the other two.
In practice LLMs are not data limited right now, they're limited by compute and model size. Which is why you see models like LLaMa that throw huge amounts of data at smaller models.
pornthrowaway42069l t1_jdnmf0j wrote
I'm confused, how is that different from what I said? Maybe I worded my response poorly, but I meant that we should focus on smaller models, rather than those gigantic ones.
Viewing a single comment thread. View all comments