extopico t1_jc5revh wrote on March 14, 2023 at 6:23 AM

Reply to comment by gwern in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

Flan-t5 is good and flan-t5-xl runs well on 3060 in 8 bit mode. It’s not meant to be a chatbot however so that’s why it does not stir up so much excitement. T5 is best used for tasks and training it to handle specific domains. This makes it far more interesting to me than LLaMa which cannot be trained (yet) by us randoms.

generatorman_ai t1_jc5vsbw wrote on March 14, 2023 at 7:23 AM

T5 is below the zero-shot phase transition crossed by GPT-3 175B (and presumably by LLaMA 7B). Modern models with instruction and HF finetuning will not need further task-specific finetuning for most purposes.