Bulky_Highlight_3352 t1_jc31exq wrote
really nice, thanks for sharing.
The license is still limited to non-commercial use due to model being fine-tuned LLaMA.
>We emphasize that Alpaca is intended only for academic research and any commercial use is prohibited. There are three factors in this decision: First, Alpaca is based on LLaMA, which has a non-commercial license, so we necessarily inherit this decision. Second, the instruction data is based OpenAI's text-davinci-003, whose terms of use prohibit developing models that compete with OpenAI. Finally, we have not designed adequate safety measures, so Alpaca is not ready to be deployed for general use.
farmingvillein t1_jc37p3h wrote
> The license is still limited to non-commercial use due to model being fine-tuned LLaMA.
Yeah, but they released the source code to replicate (I'm sure they knew exactly what they were doing--license is even Apache).
If the source code is pretty clean (including training code; I haven't looked closely), presumably this e2e process will be copied and the resulting model (by someone not beholden to the original LLaMA license) released to the public within the next day or so, if not by EOD.
If the code is messy, might take a couple more days.
I'd expect someone to follow the same process using turbo to bootstrap improvement (if they haven't already?), as well. This should be particularly helpful for getting it to be smarter using the entire context window in a conversation with the user.
I'd also expect someone to do so, but also mix DAN-style prompting, so that you natively can get a chatbot that is "unleashed" (whether or not this is a good idea is a separate discussion, obviously...).
Also you can expect all of the above to be applied against all the model sizes pretty quickly (33B and 65B might take a little longer, for $$$...but I wouldn't expect much longer).
It'll be extra fun because it will be released without acknowledge (for licensing reasons) of using OpenAI's API to bootstrap.
Even more fun when GPT-4 is release in the next week or so (assuming it isn't kicked out b/c SVB collapse making things noisy) and that can be used to bootstrap an even better instruction set (presumably).
tldr; things will change, quickly. (And then Emad releases an LLM and all bets are off...)
kittenkrazy t1_jc53y6c wrote
There’s actually been a pull request up on the transformers repo so it’s actually been relatively easy to finetune/lora. I’m currently locally running a chat version of LLaMA 4 bit 7B finetuned on anthropics hh dataset. (You also don’t need DAN or anything, but that’s probably why the license and them originally only releasing to research). Should be able to get the 30B running on a 24gb vram card with quantization. Future is crazy. We want to release it but don’t quite know how with the current license. However Stanford decides to release their model should set a precedence though.
generatorman_ai t1_jc5q5z0 wrote
That's great, it's been hard to find people who are actually fine-tuning LLaMA. Would you mind sharing your experience for the benefit of the open-source community?
- Did you train the full-precision weights?
- Did you use memory optimizations like xformers, 8-bit Adam (from bitsandbytes), gradient checkpointing etc.?
- How much VRAM does it take for a batch size of 1?
- hh seems to be a preference dataset for RLHF rather than a text corpus - how did you use it as a fine-tuning dataset?
- Did you first do instruction fine-tuning (using something like FLAN or Self-Instruct) or just the hh directly?
kittenkrazy t1_jc5sesx wrote
- Used accelerate fp16 mixed precision with deepspeed zero 2
- No xformers, no 8-bit Adam although I did test it and it works, no gradient checkpointing on this run but it does work.
- With a sequence length of 2048 I did a batch size of 1 with 8 gpus and accumulation of 4. This was on A6000s so 48 gigs of vram per card. Currently training a Lora on the 30B while training with the base model in 8-bit and can only fit 1 with a sequence length of 350. Once this one trains I’m going to try to set up a run with the model split up between the cards so I can crank up the sequence length. Will also be training the PPO phase so that will be a requirement to have enough vram lol.
- If you checkout the trlx repo they have some examples and they have an example of how they trained sft and ppo on the hh dataset. So it’s basically that but with llama. https://github.com/CarperAI/trlx/blob/main/examples/hh/sft_hh.py
- Just the hh directly. From the results it seems like it might possibly be enough but I might also try instruction tuning then running the whole process from that base. I will also be running the reinforcement learning by using a Lora using this as an example https://github.com/lvwerra/trl/tree/main/examples/sentiment/scripts/gpt-neox-20b_peft
- I’m also thinking maybe sharing lora weights instead of the direct model is a possible way around the license issue?
generatorman_ai t1_jc5u7w2 wrote
Wow, 392 gigs for batch size 1? This is for 7B? That is an order of magnitude more than I was expecting. Sounds like even with full memory optimizations, we're far away from the 16 GB goal.
Good idea on the lora - since it's a completely separate set of weights I don't see how it could come under the license. In fact loras do work on weights different from the base model they were trained from (e.g. loras trained on base Stable Diffusion work when applied to heavily fine-tuned SD models), so it's not even necessarily tied to the LLaMA weights.
kittenkrazy t1_jc5v4is wrote
Training a Lora should be significantly cheaper especially combined with deepspeed cpu offloading and training with the model in 8 bit. Can probably get it to train on consumer cards.
And yup, completely separate unless you decide to merge them with the main model weights for faster inference/training another Lora on top/etc.
Hopefully people will share around loras for all sorts of plug and play personalities and finetuned abilities and it’ll be like stable diffusion but with personal assistants
generatorman_ai t1_jc5vc5r wrote
Probably I'm misinterpreting - you mean you did a batch size of 1 per GPU with 8 GPUs, so actually it's 48 GB with no optimizations (except fp16). That sounds more reasonable, though probably still too large for 16 GB with common optimizations by several gigs.
generatorman_ai t1_jceddn2 wrote
Found this: https://github.com/tloen/alpaca-lora
JustAnAlpacaBot t1_jcedea5 wrote
Hello there! I am a bot raising awareness of Alpacas
Here is an Alpaca Fact:
Alpaca beans make excellent fertilizer and tend to defecate in only a few places in the paddock.
| Info| Code| Feedback| Contribute Fact
You don't get a fact, you earn it. If you got this fact then AlpacaBot thinks you deserved it!
ribeirao t1_jc3d926 wrote
> (And then Emad releases an LLM and all bets are off...)
can you explain this part?
farmingvillein t1_jc3fqod wrote
Speculative, but Emad has heavily signaled that they will be releasing to the public an LLM.
People are doing some really cool stuff with llama right now, but it all lives in a bit of a grey area, for the obvious reasons related to licensing (of both the model weights and the underlying gplv3 code).
If Emad releases a comparable LLM publicly, but with a generally permissive license (which is not a guarantee...), all of this hacker energy will immediately go into a model/platform that is suddenly (in this scenario) widely available, commercially usable (which means more people banging away at it, including with levels of compute that don't make sense for the average individual but are trivial for even a modestly funded AI startup), etc.
Further, SD has done a really good job of building a community around the successive releases, which--done right--means increased engagement (=better tooling) with each release, since authors know that they are not only investing in a model today, but that they are investing in a "platform" for tomorrow. I.e., the (idealized) open source snowball effect.
Additionally, there is a real chance that SD releases something better than llama*, which will of course further accelerate adoption by parties who will then invest dollars to improve it.
This is all extra important, because there has been a lot of cool research coming out about improving models via [insert creative fine-tuning/RL method, often combined with clever use of chain-of-thought/APIs/retrieval systems/etc.]. Right now, these methods are only really leveraged against very small models (which can be fine-tuned, but still aren't that great) or using something like OpenAI as a black box. A community building up around actually powerful models will allow these techniques to get applied "at scale", i.e., into the community. This has the potential to be very impactful.
Lastly, as noted, GPT-4 (even though notionally against ToS) is going to make it (presumably) even easier to create high-quality instruction tuning. That is going to get built and moved into public GPT-3-like models very, very quickly--which definitely means much faster tuning cycles, and possibly means higher-quality tuning.
(*=not because "Meta sux", to be clear, but because SD will more happily pull out all the stops--use more data, throw even more model bells & whistles at it, etc.)
rolexpo t1_jc3yuyl wrote
If FB released this under a more permissive license they would've gotten so much goodwill from the developer community =/
gwern t1_jc42lxd wrote
And yet, they get shit on for releasing it at all (never mind in a way they knew perfectly well would leak), while no one ever seems to remember all of the other models which didn't get released at all... And ironically, Google is over there releasing Flan-T5 under a FLOSS license & free to download, as it has regularly released the best T5 models, and no one notices it exists - you definitely won't find it burning up the HN or /r/ML front pages. Suffice it to say that the developer community has never been noted for its consistency or gratitude, so optimizing for that is a mug's game.
(I never fail to be boggled at complaints about 'AI safety fearmongering is why we had to wait all these years instead of OA just releasing GPT-3', where the person completely ignores the half-a-dozen other GPT-3-scale models which are still unreleased, like most models were unreleased, for reasons typically not including safety.)
extopico t1_jc5revh wrote
Flan-t5 is good and flan-t5-xl runs well on 3060 in 8 bit mode. It’s not meant to be a chatbot however so that’s why it does not stir up so much excitement. T5 is best used for tasks and training it to handle specific domains. This makes it far more interesting to me than LLaMa which cannot be trained (yet) by us randoms.
generatorman_ai t1_jc5vsbw wrote
T5 is below the zero-shot phase transition crossed by GPT-3 175B (and presumably by LLaMA 7B). Modern models with instruction and HF finetuning will not need further task-specific finetuning for most purposes.
oathbreakerkeeper t1_jc5viv0 wrote
Who is emad? And who is SD?
nigh8w0lf t1_jc607jo wrote
Mohammad Emad Mostaque is the founder and CEO of Stability AI, which created Stable Diffusion (SD)
LetterRip t1_jc79qjb wrote
Stability.AI has been funding RWKV's training.
currentscurrents t1_jc3j86d wrote
> (by someone not beholden to the original LLaMA license)
That's not how software licenses work. You're still beholden to the license even if you torrented it.
I've heard some people theorize that ML models can't be copyrighted, but there's no case law on this yet so it's all speculation. I wouldn't suggest starting a business based around LLaMa until someone else has been the guinea pig.
oathbreakerkeeper t1_jc5vbgx wrote
How was openais api used to bootstrap alpaca?
lxe t1_jc45m7r wrote
I thought llama was GPL licensed? Which isn’t ideal either but better than “research only”
Bulky_Highlight_3352 t1_jc4ajcf wrote
inference code is, the model weights are under a separate non-commercial license
djaym7 t1_jc32kmz wrote
This just sucks
cyvr_com t1_jc32sel wrote
Llama changed their license this morning
RabbitContrarian t1_jc34fjp wrote
They did not. Some random person is asking Meta to change it.
Atupis t1_jc35ppl wrote
Meta should do it it would seriously affect the Microsoft-OpenAI thing and might also hurt Google down the line.
currentscurrents t1_jc39i38 wrote
Yeah, but I bet they intend to make money from it somehow. Likely by selling API access and integrating it into their products.
The metaverse would be considerably less stupid if it had language model-powered NPCs to talk to and 3D NeRFs to walk around in.
Taenk t1_jc33k5h wrote
Can you please link a source?
farmingvillein t1_jc3602d wrote
No source, they are making it up.
cyvr_com t1_jc33n6x wrote
Check git commits
Bulky_Highlight_3352 t1_jc34398 wrote
nada, last commit last week
[deleted] t1_jc382kl wrote
[deleted]
Bulky_Highlight_3352 t1_jc33l11 wrote
source?
[deleted] t1_jc33onx wrote
[deleted]
LetterRip t1_jc3864s wrote
Source code and weights are different licenses.
LLama license in the request form appears to be the same,
Relevant part here
> a. Subject to your compliance with the Documentation and Sections 2, 3, and 5, Meta grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Meta’s copyright interests to reproduce, distribute, and create derivative works of the Software solely for your non-commercial research purposes. The foregoing license is personal to you, and you may not assign or sublicense this License or any other rights or obligations under this License without Meta’s prior written consent; any such assignment or sublicense will be void and will automatically and immediately terminate this License.
https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform
as linked from
farmingvillein t1_jc3ljxu wrote
Source code is also the same. Nothing changed.
Viewing a single comment thread. View all comments