guillaumekln
guillaumekln t1_j9nfl9t wrote
Reply to [D] Faster Flan-T5 inference by _learn_faster_
You can also check out the CTranslate2 library which supports efficient inference of T5 models, including 8-bit quantization on CPU and GPU. There is a usage example in the documentation.
Disclaimer: I’m the author of CTranslate2.
guillaumekln t1_j9nv5n0 wrote
Reply to comment by _learn_faster_ in [D] Faster Flan-T5 inference by _learn_faster_
No. Even though the high-level class is named
Translator
, it can be used to run any tasks that would work usingT5ForConditionalGeneration
in the transformers library.