LetterRip t1_jc4rifv wrote
Reply to comment by stefanof93 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Depends on the model. Some have difficulty even with full 8bit quantization; others you can go to 4bit relatively easily. There is some research that suggests 3bit might be the useful limit, with rarely certain 2bit models.
Viewing a single comment thread. View all comments