Remi_Coulom t1_izfukyo wrote on December 8, 2022 at 8:18 PM

Reply to comment by suflaj in What framework can I use to quantize a deep learning model to specific bit-widths? by MahmoudAbdAlghany

NVIDIA's tensor cores support 4-bit, 2-bit and 1-bit operation. I am very surprised no popular library takes advantage of this possibility. Here is a 3-year-old blog post about using 4-bit inference: https://developer.nvidia.com/blog/int4-for-ai-inference/