kkg_scorpio t1_jbz91de wrote
Reply to comment by Upstairs_Suit_9464 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
Check out the terms "quantization aware training" and "post training quantization".
8-bit, 4-bit, 2-bit, hell even 1-bit inference are scenarios which are extremely relevant for edge devices.
Taenk t1_jbzaeau wrote
Isn't 1-bit quantisation qualitatively different as you can do optimizations only available if the parameters are fully binary?
AsIAm t1_jc168cw wrote
It is. But that doesn't mean 1-bit neural nets are impossible. Even Turing himself toyed with such networks – https://www.npl.co.uk/getattachment/about-us/History/Famous-faces/Alan-Turing/80916595-Intelligent-Machinery.pdf?lang=en-GB
[deleted] t1_jbztbxc wrote
[removed]
Viewing a single comment thread. View all comments