remghoost7 t1_jbzqf5m wrote
Reply to comment by Amazing_Painter_7692 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Most excellent. Thank you so much! I will look into all of these.
Guess I know what I'm doing for the rest of the day. Time to make more coffee! haha.
You are my new favorite person this week.
Also, one final question, if you will. What's so unique about the 4-bit weights and why would you prefer to run it in that manner? Is it just VRAM optimization requirements....? I'm decently versed in Stable Diffusion, but LLMs are fairly new territory for me.
My question seemed to have been answered here, and it is a VRAM limitation. Also, that last link seems to support 4-bit models as well. Doesn't seem too bad to set up.... Though I installed A1111 when it first came out, so I learned through the garbage of that. Lol. I was wrong. Oh so wrong. haha.
Yet again, thank you for your time and have a wonderful rest of your day. <3
[deleted] t1_jbzqsrt wrote
[removed]
Viewing a single comment thread. View all comments