_learn_faster_ OP t1_ja6zovh wrote on February 27, 2023 at 8:34 AM Reply to comment by machineko in [D] Faster Flan-T5 inference by _learn_faster_ We have GPUs (e.g. A100) but can only use 1 GPU per request (not multi-gpu). We are also willing to take a bit of an accuracy hit. Let me know what you think would be best for us? When you say compression do you mean things like pruning and distillation? Permalink Parent 1
_learn_faster_ OP t1_j9nuqe3 wrote on February 23, 2023 at 8:35 AM Reply to comment by guillaumekln in [D] Faster Flan-T5 inference by _learn_faster_ For flan-t5 does this only work for a Translation task? Permalink Parent 1
[D] Faster Flan-T5 inference Submitted by _learn_faster_ t3_1194vcc on February 22, 2023 at 4:59 PM in MachineLearning 8 comments 8
LPT: Your memory is SO MUCH more powerful than you think… we were just never taught to use it properly at school. Learning techniques like “Memory Palaces” will let you learn anything FAR faster Submitted by _learn_faster_ t3_z583rl on November 26, 2022 at 1:55 PM in LifeProTips No comments 5
_learn_faster_ OP t1_ja6zovh wrote
Reply to comment by machineko in [D] Faster Flan-T5 inference by _learn_faster_
We have GPUs (e.g. A100) but can only use 1 GPU per request (not multi-gpu). We are also willing to take a bit of an accuracy hit.
Let me know what you think would be best for us?
When you say compression do you mean things like pruning and distillation?