guillaumekln

guillaumekln t1_j9nv5n0 wrote on February 23, 2023 at 8:41 AM

Reply to comment by _learn_faster_ in [D] Faster Flan-T5 inference by _learn_faster_

No. Even though the high-level class is named Translator, it can be used to run any tasks that would work using T5ForConditionalGeneration in the transformers library.

guillaumekln t1_j9nfl9t wrote on February 23, 2023 at 5:34 AM

Reply to [D] Faster Flan-T5 inference by _learn_faster_

You can also check out the CTranslate2 library which supports efficient inference of T5 models, including 8-bit quantization on CPU and GPU. There is a usage example in the documentation.

Disclaimer: I’m the author of CTranslate2.