Dendriform1491 t1_jbzj7zu wrote
Reply to comment by ML4Bratwurst in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Wait until you hear about the 1/2 bit.
currentscurrents t1_jc03yjr wrote
You could pack more bits in your bit with in-memory compression. You'd need hardware support for decompression inside the processor core.
Dendriform1491 t1_jc0bgxd wrote
Or make it data free altogether
[deleted] t1_jc02vok wrote
[deleted]
Viewing a single comment thread. View all comments