Viewing a single comment thread. View all comments

kkg_scorpio t1_jbz91de wrote

Check out the terms "quantization aware training" and "post training quantization".

8-bit, 4-bit, 2-bit, hell even 1-bit inference are scenarios which are extremely relevant for edge devices.

27

Taenk t1_jbzaeau wrote

Isn't 1-bit quantisation qualitatively different as you can do optimizations only available if the parameters are fully binary?

18