pommedeterresautee OP t1_itv3bu7 wrote
Reply to comment by programmerChilli in [P] Up to 12X faster GPU inference on Bert, T5 and other transformers with OpenAI Triton kernels by pommedeterresautee
Yeah, it doesn't make sense to me either. Also I was expecting a bit better speedup (regarding those shared on the PyTorch dev forum). I tried several combinations of params (enabling the disabled optimizations) but they were either broken (eg matmul ops template) or making things slower.
Scripts are here: https://github.com/ELS-RD/kernl/tree/main/experimental/benchmarks
Let me know if you find something suspicious.
Viewing a single comment thread. View all comments