generatorman_ai t1_jc5u7w2 wrote on March 14, 2023 at 7:01 AM

Reply to comment by kittenkrazy in [R] Stanford-Alpaca 7B model (an instruction tuned version of LLaMA) performs as well as text-davinci-003 by dojoteef

Wow, 392 gigs for batch size 1? This is for 7B? That is an order of magnitude more than I was expecting. Sounds like even with full memory optimizations, we're far away from the 16 GB goal.

Good idea on the lora - since it's a completely separate set of weights I don't see how it could come under the license. In fact loras do work on weights different from the base model they were trained from (e.g. loras trained on base Stable Diffusion work when applied to heavily fine-tuned SD models), so it's not even necessarily tied to the LLaMA weights.

kittenkrazy t1_jc5v4is wrote on March 14, 2023 at 7:14 AM

Training a Lora should be significantly cheaper especially combined with deepspeed cpu offloading and training with the model in 8 bit. Can probably get it to train on consumer cards.

And yup, completely separate unless you decide to merge them with the main model weights for faster inference/training another Lora on top/etc.

Hopefully people will share around loras for all sorts of plug and play personalities and finetuned abilities and it’ll be like stable diffusion but with personal assistants

generatorman_ai t1_jc5vc5r wrote on March 14, 2023 at 7:17 AM

Probably I'm misinterpreting - you mean you did a batch size of 1 per GPU with 8 GPUs, so actually it's 48 GB with no optimizations (except fp16). That sounds more reasonable, though probably still too large for 16 GB with common optimizations by several gigs.