Amazing_Painter_7692 OP t1_jbzov27 wrote
Reply to comment by stefanof93 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
https://github.com/qwopqwop200/GPTQ-for-LLaMa
Performance is quite good.
Viewing a single comment thread. View all comments