Viewing a single comment thread. View all comments

Dendriform1491 t1_jbzj7zu wrote on March 12, 2023 at 10:52 PM

Reply to comment by ML4Bratwurst in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

Wait until you hear about the 1/2 bit.

currentscurrents t1_jc03yjr wrote on March 13, 2023 at 1:31 AM

You could pack more bits in your bit with in-memory compression. You'd need hardware support for decompression inside the processor core.

Dendriform1491 t1_jc0bgxd wrote on March 13, 2023 at 2:30 AM

Or make it data free altogether

https://github.com/philipl/pifs

[deleted] t1_jc02vok wrote on March 13, 2023 at 1:22 AM

[deleted]