_Arsenie_Boca_ t1_itwt4ls wrote on October 26, 2022 at 9:43 PM

I dont think any model you can run on a single commodity gpu will be on par with gpt-3. Perhaps GPT-J, Opt-{6.7B / 13B} and GPT-Neox20B are the best alternatives. Some might need significant engineering (e.g. deepspeed) to work on limited vram

deeceeo t1_itxiswn wrote on October 27, 2022 at 12:53 AM

UL2 is 20b and supposedly on par with GPT-3?

_Arsenie_Boca_ t1_ityby3b wrote on October 27, 2022 at 5:06 AM

True, I forgot about this one. Although getting to run a 20b model (Neox20b and UL20B) on an rtx gpu is probably a big stretch

AuspiciousApple OP t1_itwvq1f wrote on October 26, 2022 at 10:01 PM

>I dont think any model you can run on a single commodity gpu will be on par with gpt-3.

That makes sense. I'm not an NLP person, so I don't have a good intuition on how these models scale or what the benchmark numbers actually mean.

In CV, the difference between a small and large model might be a few % accuracy on imagenet but even small models work reasonably well. FLAN T5-XL seems to generate nonsense 90% of the time for the prompts that I've tried, whereas GPT3 has great output most of the time.

Do you have any experience with these open models?

_Arsenie_Boca_ t1_ityccjh wrote on October 27, 2022 at 5:10 AM

I dont think there is a fundamental difference between cv and nlp. However, we expect language models to be much more generalist than any vision model (Have you ever seen a vision model that performs well on discriminative and generative tasks across domains without finetuning?) I believe this is where scale is the enabling factor.