Viewing a single comment thread. View all comments

TeamPupNSudz t1_j9uih5g wrote

> It's around as good as GPT-3(175B) but smaller(65B) like chinchilla.

Based on their claim, it's way more extreme than that even. They say the 13B model outperforms GPT3 (175B), which seems so extreme its almost outlandish. That's only 7% the size.

27

blueSGL t1_j9umbty wrote

> which seems so extreme its almost outlandish.

reminder that GPT3 was datastarved as per the Chinchilla scaling laws.

24

Lawjarp2 t1_j9uj86z wrote

In some tasks the 7B model seems close enough to the orginal gpt-3 175B. With some optimization it probably can be run on a good laptop with a reasonable loss in accuracy.

13B doesn't outperform in everything however 65B one does. But it's kinda weird to see their 13B model be nearly as good their 65B one.

However all their models are worse than the biggest Minerva model.

4

DuckyBertDuck t1_j9yjdua wrote

It makes sense if you look at the chinchilla findings which suggest that ~10x more data is optimal.

2