UseNew5079
UseNew5079 t1_je9wrw6 wrote
Reply to comment by Akimbo333 in How much smaller can a GPT-4-level model get? by Technologenesis
Check LLama paper: https://arxiv.org/pdf/2302.13971.pdf
Specifically this graph: https://paste.pics/6f817f0aa71065e155027d313d70f18c
They increase performance (reduce loss) with parameters or training time. More parameters just allow for faster and deeper initial drop in error/loss but later part looks the same for all model sizes. At least that is my interpretation.
UseNew5079 t1_je6lgyb wrote
Maybe 7b model can get GPT-4 level performance if trained for _very_ long. Facebook paper showed that performance increased until the end of training and it looks like there was no plateau. Maybe it's just very inefficient but possible? Or maybe there is another way.
UseNew5079 t1_jc2a9t1 wrote
Reply to comment by sigoden in AIChat: A cli tool to chat with gpt-3.5/chatgpt in terminal. by sigoden
Oh i see. That makes sense. So if you try to send more than 4096 tokens the API will respond with error or will it charge you for >4096 tokens and just skip oldest?
UseNew5079 t1_jc204th wrote
Really cool project! There should be a configuration option to limit size of input text to protect from mistakes like `cat big_big_file | aichat`
UseNew5079 t1_j8xiqa3 wrote
Reply to Microsoft Killed Bing by Neurogence
At least they have shown what is possible. There is no going back.
UseNew5079 t1_jecefwx wrote
Reply to [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
Excellent quality responses from this model. This can be actually usable.