PilotThen t1_jdppmpl wrote on March 26, 2023 at 5:27 AM

There's also the point that they optimise for computer power at training time.

In mass deployment computer power at inference time starts to matter.

PilotThen t1_jdpnoul wrote on March 26, 2023 at 5:05 AM

I didn't find a paper but I think that is sort of what EleutherAI was doing with their pythia models.

You'll find the models on huggingface and I'd say that they are also interesting from an opensource perspective because of their license (apache-2.0)

(Also open-assistent seems to be building on top of them.)

I'm down the rabbit hole of finding the best model to build on and learn with this weekend.

Currently poking at PygmalionAI/pygmalion-1.3b

Beware: The different size pygmalion model are finetuned from different pretrained models, so have inherited different licenses.

I like my results with 6b better but 1.3b has the better license (apgl-3.0)