Viewing a single comment thread. View all comments

kkg_scorpio t1_jbz91de wrote on March 12, 2023 at 9:39 PM

Reply to comment by Upstairs_Suit_9464 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

Check out the terms "quantization aware training" and "post training quantization".

8-bit, 4-bit, 2-bit, hell even 1-bit inference are scenarios which are extremely relevant for edge devices.

Taenk t1_jbzaeau wrote on March 12, 2023 at 9:49 PM

Isn't 1-bit quantisation qualitatively different as you can do optimizations only available if the parameters are fully binary?

AsIAm t1_jc168cw wrote on March 13, 2023 at 8:04 AM

It is. But that doesn't mean 1-bit neural nets are impossible. Even Turing himself toyed with such networks – https://www.npl.co.uk/getattachment/about-us/History/Famous-faces/Alan-Turing/80916595-Intelligent-Machinery.pdf?lang=en-GB

[deleted] t1_jbztbxc wrote on March 13, 2023 at 12:08 AM

[removed]