Submitted by Technologenesis t3_125wzvw in singularity
Akimbo333 t1_je9proo wrote
Reply to comment by UseNew5079 in How much smaller can a GPT-4-level model get? by Technologenesis
Why does performance increase with training instead of parameters?
UseNew5079 t1_je9wrw6 wrote
Check LLama paper: https://arxiv.org/pdf/2302.13971.pdf
Specifically this graph: https://paste.pics/6f817f0aa71065e155027d313d70f18c
They increase performance (reduce loss) with parameters or training time. More parameters just allow for faster and deeper initial drop in error/loss but later part looks the same for all model sizes. At least that is my interpretation.
Viewing a single comment thread. View all comments