generatorman_ai
generatorman_ai t1_jc5w4m9 wrote
Reply to comment by dojoteef in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
The general problem of generative NPCs seems like a subset of robotics rather than pure language models, so that still seems some way off (but Google made some progress with PaLM-E).
LLMs and Disco Elysium sounds like the coolest paper ever! I would love to follow you on twitter to get notified when you release the preprint.
generatorman_ai t1_jc5vsbw wrote
Reply to comment by extopico in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
T5 is below the zero-shot phase transition crossed by GPT-3 175B (and presumably by LLaMA 7B). Modern models with instruction and HF finetuning will not need further task-specific finetuning for most purposes.
generatorman_ai t1_jc5vc5r wrote
Reply to comment by generatorman_ai in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Probably I'm misinterpreting - you mean you did a batch size of 1 per GPU with 8 GPUs, so actually it's 48 GB with no optimizations (except fp16). That sounds more reasonable, though probably still too large for 16 GB with common optimizations by several gigs.
generatorman_ai t1_jc5u7w2 wrote
Reply to comment by kittenkrazy in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Wow, 392 gigs for batch size 1? This is for 7B? That is an order of magnitude more than I was expecting. Sounds like even with full memory optimizations, we're far away from the 16 GB goal.
Good idea on the lora - since it's a completely separate set of weights I don't see how it could come under the license. In fact loras do work on weights different from the base model they were trained from (e.g. loras trained on base Stable Diffusion work when applied to heavily fine-tuned SD models), so it's not even necessarily tied to the LLaMA weights.
generatorman_ai t1_jc5q5z0 wrote
Reply to comment by kittenkrazy in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
That's great, it's been hard to find people who are actually fine-tuning LLaMA. Would you mind sharing your experience for the benefit of the open-source community?
- Did you train the full-precision weights?
- Did you use memory optimizations like xformers, 8-bit Adam (from bitsandbytes), gradient checkpointing etc.?
- How much VRAM does it take for a batch size of 1?
- hh seems to be a preference dataset for RLHF rather than a text corpus - how did you use it as a fine-tuning dataset?
- Did you first do instruction fine-tuning (using something like FLAN or Self-Instruct) or just the hh directly?
generatorman_ai t1_jceddn2 wrote
Reply to comment by kittenkrazy in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef
Found this: https://github.com/tloen/alpaca-lora