PilotThen
PilotThen t1_jdpnoul wrote
Reply to comment by ganzzahl in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
I didn't find a paper but I think that is sort of what EleutherAI was doing with their pythia models.
You'll find the models on huggingface and I'd say that they are also interesting from an opensource perspective because of their license (apache-2.0)
(Also open-assistent seems to be building on top of them.)
PilotThen t1_jdpn8eb wrote
I'm down the rabbit hole of finding the best model to build on and learn with this weekend.
Currently poking at PygmalionAI/pygmalion-1.3b
Beware: The different size pygmalion model are finetuned from different pretrained models, so have inherited different licenses.
I like my results with 6b better but 1.3b has the better license (apgl-3.0)
PilotThen t1_jdppmpl wrote
Reply to comment by currentscurrents in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
There's also the point that they optimise for computer power at training time.
In mass deployment computer power at inference time starts to matter.