Submitted by faschu t3_1035xzs in MachineLearning
Is there a way to do quantization in native pytorch for GPUs (Cuda)?
I know that TensorRT offers this functionality, but I would prefer working with native pytorch code. I understand from the pytorch docs, https://pytorch.org/docs/stable/quantization.html, that quantization for the GPU is linked to TensorRT. Given that Nvidia GPUs offer quantization for some time now, it's find it difficult to believe that no other solid implementation for quantization other than TensorRT exists. Grateful for any pointer or suggestions.
_Arsenie_Boca_ t1_j2xrj3r wrote
I'm not an expert here, but as far as I understand from the docs, quantization is not yet a mature feature.
Im curious, what is the reason you dont want TensorRT?