gliptic
gliptic t1_jd2bsc7 wrote
Reply to comment by lurkinginboston in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph
In fact, GPT3 is 175B. But GPT3 is old now and doesn't make effective use of those parameters.
gliptic t1_jcjpy0h wrote
Reply to comment by cipri_tom in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
What's wrong with Arveycavey ;).
gliptic t1_j99y0cp wrote
Reply to [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
RWKV can run on very little VRAM with Rwkvstic streaming and 8-bit. I've not tested streaming, but I expect it's a lot slower. 7B model sadly takes 8 GB with just 8-bit quantization.
gliptic t1_irmzeaq wrote
Reply to comment by RecklessRelentless99 in Enjoy the details. I work 16 hours edit and merge 380 RAW images of the moon and the final result was worth it by daryavaseum
The saturation is just turned up to reveal subtle differences in color. The moon is naturally almost monochromatic with any sensor.
gliptic t1_jee0fbk wrote
Reply to comment by yehiaserag in [P] Introducing Vicuna: An open-source language model based on LLaMA 13B by Business-Lead2679
Delta weights doesn't mean LoRA. It's just the difference (e.g. XOR) of their new weights and the original weights.