ribeirao t1_jc3d926 wrote on March 13, 2023 at 7:13 PM

Reply to comment by farmingvillein in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

> (And then Emad releases an LLM and all bets are off...)

can you explain this part?

farmingvillein t1_jc3fqod wrote on March 13, 2023 at 7:29 PM

Speculative, but Emad has heavily signaled that they will be releasing to the public an LLM.

People are doing some really cool stuff with llama right now, but it all lives in a bit of a grey area, for the obvious reasons related to licensing (of both the model weights and the underlying gplv3 code).

If Emad releases a comparable LLM publicly, but with a generally permissive license (which is not a guarantee...), all of this hacker energy will immediately go into a model/platform that is suddenly (in this scenario) widely available, commercially usable (which means more people banging away at it, including with levels of compute that don't make sense for the average individual but are trivial for even a modestly funded AI startup), etc.

Further, SD has done a really good job of building a community around the successive releases, which--done right--means increased engagement (=better tooling) with each release, since authors know that they are not only investing in a model today, but that they are investing in a "platform" for tomorrow. I.e., the (idealized) open source snowball effect.

Additionally, there is a real chance that SD releases something better than llama*, which will of course further accelerate adoption by parties who will then invest dollars to improve it.

This is all extra important, because there has been a lot of cool research coming out about improving models via [insert creative fine-tuning/RL method, often combined with clever use of chain-of-thought/APIs/retrieval systems/etc.]. Right now, these methods are only really leveraged against very small models (which can be fine-tuned, but still aren't that great) or using something like OpenAI as a black box. A community building up around actually powerful models will allow these techniques to get applied "at scale", i.e., into the community. This has the potential to be very impactful.

Lastly, as noted, GPT-4 (even though notionally against ToS) is going to make it (presumably) even easier to create high-quality instruction tuning. That is going to get built and moved into public GPT-3-like models very, very quickly--which definitely means much faster tuning cycles, and possibly means higher-quality tuning.

(*=not because "Meta sux", to be clear, but because SD will more happily pull out all the stops--use more data, throw even more model bells & whistles at it, etc.)

rolexpo t1_jc3yuyl wrote on March 13, 2023 at 9:33 PM

If FB released this under a more permissive license they would've gotten so much goodwill from the developer community =/

gwern t1_jc42lxd wrote on March 13, 2023 at 9:58 PM

And yet, they get shit on for releasing it at all (never mind in a way they knew perfectly well would leak), while no one ever seems to remember all of the other models which didn't get released at all... And ironically, Google is over there releasing Flan-T5 under a FLOSS license & free to download, as it has regularly released the best T5 models, and no one notices it exists - you definitely won't find it burning up the HN or /r/ML front pages. Suffice it to say that the developer community has never been noted for its consistency or gratitude, so optimizing for that is a mug's game.

(I never fail to be boggled at complaints about 'AI safety fearmongering is why we had to wait all these years instead of OA just releasing GPT-3', where the person completely ignores the half-a-dozen other GPT-3-scale models which are still unreleased, like most models were unreleased, for reasons typically not including safety.)

extopico t1_jc5revh wrote on March 14, 2023 at 6:23 AM

Flan-t5 is good and flan-t5-xl runs well on 3060 in 8 bit mode. It’s not meant to be a chatbot however so that’s why it does not stir up so much excitement. T5 is best used for tasks and training it to handle specific domains. This makes it far more interesting to me than LLaMa which cannot be trained (yet) by us randoms.

generatorman_ai t1_jc5vsbw wrote on March 14, 2023 at 7:23 AM

T5 is below the zero-shot phase transition crossed by GPT-3 175B (and presumably by LLaMA 7B). Modern models with instruction and HF finetuning will not need further task-specific finetuning for most purposes.

oathbreakerkeeper t1_jc5viv0 wrote on March 14, 2023 at 7:19 AM

Who is emad? And who is SD?

nigh8w0lf t1_jc607jo wrote on March 14, 2023 at 8:28 AM

Mohammad Emad Mostaque is the founder and CEO of Stability AI, which created Stable Diffusion (SD)

LetterRip t1_jc79qjb wrote on March 14, 2023 at 3:45 PM

Stability.AI has been funding RWKV's training.