Submitted by MahmoudAbdAlghany t3_zg71j1 in deeplearning
suflaj t1_izfke61 wrote
There are none, unless you plan on emulating them, which you'd have to do yourself.
The available quantization widths correspond to what the hardware is capable of doing, and hardware generally revolves around widths that have bytes as their base length.
Remi_Coulom t1_izfukyo wrote
NVIDIA's tensor cores support 4-bit, 2-bit and 1-bit operation. I am very surprised no popular library takes advantage of this possibility. Here is a 3-year-old blog post about using 4-bit inference: https://developer.nvidia.com/blog/int4-for-ai-inference/
suflaj t1_izfw75x wrote
They do, but they use bigger registers, so ultimately, unless you can hand optimize it to pack operations together, you will have no benefit from it. That would at least imply writing your own CUDA kernels.
Furthermore, 8 bit is already often too small to be stable. Why go lower? If you want garbage outputs, you could always fit whatever task on a smaller model. It's easier to cut model size in half and use 8-bit or 4x and use 16-bit, than to make 4 bit or lower work.
At this point in time, TensorRT seems to be the best you'll get for as little involvement as possible. Based on benchmarks, it also seems to outperform INT4 precision by a significant margin. The only drawback is its license, which implicitly prevents commercial use.
horselover_f4t t1_izibm6r wrote
Can I ask you what you mean by "implicitly prevents"?
https://github.com/NVIDIA/TensorRT/blob/main/LICENSE seems to permit commercial use, do you refer to trademarks?
suflaj t1_izihg01 wrote
This is only the code license for the open source portion, but the SDK license of the general, proprietary software that TensorRT is, is also something you have to agree on: https://docs.nvidia.com/deeplearning/tensorrt/sla/index.html
In there, ownership is phrased in such an ambiguous way the legal team of a company would probably never greenlight using it.
horselover_f4t t1_izik5mz wrote
I will have to check that out, thank you!
Viewing a single comment thread. View all comments