Submitted by _learn_faster_ t3_1194vcc in MachineLearning
What's the best way to improve the inference speed of a Flan-T5 model?
Onnx runtime doesn't seem to work for T5 models & Torchscript also doesn't seem to help speed it up (not sure why!)
Submitted by _learn_faster_ t3_1194vcc in MachineLearning
What's the best way to improve the inference speed of a Flan-T5 model?
Onnx runtime doesn't seem to work for T5 models & Torchscript also doesn't seem to help speed it up (not sure why!)
LetterRip t1_j9ker51 wrote
See this tutorial - converts to ONXX CPU, then to tensor-RT for a 3-6x speedup.
https://developer.nvidia.com/blog/optimizing-t5-and-gpt-2-for-real-time-inference-with-tensorrt/