Viewing a single comment thread. View all comments

Lawjarp2 t1_j9uf61c wrote

It's around as good as GPT-3(175B) but smaller(65B) like chinchilla. If released publicly like OPT models then it could be really big for open-source. If optimised like flexgen to run on a single GPU or a small rig maybe we could all have our own personal assistant or pair programmer.

34

TeamPupNSudz t1_j9uih5g wrote

> It's around as good as GPT-3(175B) but smaller(65B) like chinchilla.

Based on their claim, it's way more extreme than that even. They say the 13B model outperforms GPT3 (175B), which seems so extreme its almost outlandish. That's only 7% the size.

27

blueSGL t1_j9umbty wrote

> which seems so extreme its almost outlandish.

reminder that GPT3 was datastarved as per the Chinchilla scaling laws.

24

Lawjarp2 t1_j9uj86z wrote

In some tasks the 7B model seems close enough to the orginal gpt-3 175B. With some optimization it probably can be run on a good laptop with a reasonable loss in accuracy.

13B doesn't outperform in everything however 65B one does. But it's kinda weird to see their 13B model be nearly as good their 65B one.

However all their models are worse than the biggest Minerva model.

4

DuckyBertDuck t1_j9yjdua wrote

It makes sense if you look at the chinchilla findings which suggest that ~10x more data is optimal.

2