jakderrida t1_j300s6g wrote on January 5, 2023 at 2:56 AM

Quantization-aware training: PyTorch provides a set of APIs for performing quantization-aware training, which allows you to train a model with quantization in mind and can often result in higher-quality quantized models. You can find more information about quantization-aware training in the PyTorch documentation (https://pytorch.org/docs/stable/quantization.html#quantization-aware-training).

Post-training static quantization: PyTorch also provides APIs for performing post-training static quantization, which involves quantizing a model that has already been trained. You can find more information about post-training static quantization in the PyTorch documentation (https://pytorch.org/docs/stable/quantization.html#post-training-static-quantization).

Dynamic quantization: PyTorch also supports dynamic quantization, which allows you to quantize a model at runtime. This can be useful for applications where the model needs to be deployed on devices with limited memory or computational resources. You can find more information about dynamic quantization in the PyTorch documentation (https://pytorch.org/docs/stable/quantization.html#dynamic-quantization).

faschu OP t1_j30wfpm wrote on January 5, 2023 at 7:48 AM

Thanks for the reply. But do these three quantization techniques work on the GPU without TensorRT? The supported backends led me to believe they are not: https://pytorch.org/docs/stable/quantization.html#backend-hardware-support