UseNew5079 t1_je9wrw6 wrote on March 30, 2023 at 2:05 PM

Reply to comment by Akimbo333 in How much smaller can a GPT-4-level model get? by Technologenesis

Check LLama paper: https://arxiv.org/pdf/2302.13971.pdf

Specifically this graph: https://paste.pics/6f817f0aa71065e155027d313d70f18c

They increase performance (reduce loss) with parameters or training time. More parameters just allow for faster and deeper initial drop in error/loss but later part looks the same for all model sizes. At least that is my interpretation.