_Arsenie_Boca_ t1_itwt4ls wrote
I dont think any model you can run on a single commodity gpu will be on par with gpt-3. Perhaps GPT-J, Opt-{6.7B / 13B} and GPT-Neox20B are the best alternatives. Some might need significant engineering (e.g. deepspeed) to work on limited vram
deeceeo t1_itxiswn wrote
UL2 is 20b and supposedly on par with GPT-3?
_Arsenie_Boca_ t1_ityby3b wrote
True, I forgot about this one. Although getting to run a 20b model (Neox20b and UL20B) on an rtx gpu is probably a big stretch
AuspiciousApple OP t1_itwvq1f wrote
>I dont think any model you can run on a single commodity gpu will be on par with gpt-3.
That makes sense. I'm not an NLP person, so I don't have a good intuition on how these models scale or what the benchmark numbers actually mean.
In CV, the difference between a small and large model might be a few % accuracy on imagenet but even small models work reasonably well. FLAN T5-XL seems to generate nonsense 90% of the time for the prompts that I've tried, whereas GPT3 has great output most of the time.
Do you have any experience with these open models?
_Arsenie_Boca_ t1_ityccjh wrote
I dont think there is a fundamental difference between cv and nlp. However, we expect language models to be much more generalist than any vision model (Have you ever seen a vision model that performs well on discriminative and generative tasks across domains without finetuning?) I believe this is where scale is the enabling factor.
Viewing a single comment thread. View all comments