Lawjarp2 t1_j9uf61c wrote on February 24, 2023 at 5:11 PM

It's around as good as GPT-3(175B) but smaller(65B) like chinchilla. If released publicly like OPT models then it could be really big for open-source. If optimised like flexgen to run on a single GPU or a small rig maybe we could all have our own personal assistant or pair programmer.

TeamPupNSudz t1_j9uih5g wrote on February 24, 2023 at 5:32 PM

> It's around as good as GPT-3(175B) but smaller(65B) like chinchilla.

Based on their claim, it's way more extreme than that even. They say the 13B model outperforms GPT3 (175B), which seems so extreme its almost outlandish. That's only 7% the size.

blueSGL t1_j9umbty wrote on February 24, 2023 at 5:56 PM

> which seems so extreme its almost outlandish.

reminder that GPT3 was datastarved as per the Chinchilla scaling laws.

Lawjarp2 t1_j9uj86z wrote on February 24, 2023 at 5:37 PM

In some tasks the 7B model seems close enough to the orginal gpt-3 175B. With some optimization it probably can be run on a good laptop with a reasonable loss in accuracy.

13B doesn't outperform in everything however 65B one does. But it's kinda weird to see their 13B model be nearly as good their 65B one.

However all their models are worse than the biggest Minerva model.

DuckyBertDuck t1_j9yjdua wrote on February 25, 2023 at 2:40 PM

It makes sense if you look at the chinchilla findings which suggest that ~10x more data is optimal.