IluvBsissa t1_j9j6v5v wrote
Reply to comment by Destiny_Knight in A German AI startup just might have a GPT-4 competitor this year. It is 300 billion parameters model by Dr_Singularity
If these models are so smol and efficient, why are they not released ?? I just don't get it. I thought PaLM was kept private because it was too costly to run to be profitable...
kermunnist t1_j9kqsaw wrote
That's because the smaller models are less useful. With neural networks (likely including biological ones) there's a hard trade off between specialized performance and general performance. If these 100+x smaller models were trained on the same data as GPT-3 they would perform 100+x worse on these metrics (maybe not exactly because in this case the model was multimodal which definitely gave a performance advantage). The big reason this model performed so much is because it was fine tuned on problems similar to the ones on this exam where as GPT-3 was fine turned on anything and everything. This means that this model would likely not be a great conversationalist and would probably flounder at most other tasks GPT-3.5 does well on.
Viewing a single comment thread. View all comments