blueSGL t1_jc5s56i wrote
Reply to comment by phire in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Less than $100 to get this sort of performance out of a 7B parameter model and from the LLaMA paper they stopped training the 7B and 13B parameter models early.
Question is now just how much better can small models get. (lawyer/doctor/therapist in everyone's pocket, completely private?)
inigid t1_jc5za26 wrote
I'm thinking a chip with the model and inference runtime baked in, maybe having the same form factor as an SD card. Hey honey have you seen that copy of me from March 2023? Ughh, I think I accidentally threw it away..
Necessary_Ad_9800 t1_jcjge23 wrote
Everyone with their own private oracle in their hands. Pretty cool tbh
blueSGL t1_jcjgsl1 wrote
Exactly.
I'm just eager to see what fine tunes are going to be made on LLaMA now, and how model merging effects them. The combination of those two techniques has lead to some crazy advancements in the Stable Diffusion world. No idea if merging will work with LLMs as it does for diffusion models. (has anyone even tried yet?)
Necessary_Ad_9800 t1_jcjj8b6 wrote
Interesting. However I find some merges in SD to be terrible. But I have no doubt the open source community will make something amazing
Viewing a single comment thread. View all comments