pommedeterresautee OP t1_itv3bu7 wrote on October 26, 2022 at 3:06 PM

Reply to comment by programmerChilli in [P] Up to 12X faster GPU inference on Bert, T5 and other transformers with OpenAI Triton kernels by pommedeterresautee

Yeah, it doesn't make sense to me either. Also I was expecting a bit better speedup (regarding those shared on the PyTorch dev forum). I tried several combinations of params (enabling the disabled optimizations) but they were either broken (eg matmul ops template) or making things slower.

Scripts are here: https://github.com/ELS-RD/kernl/tree/main/experimental/benchmarks

Let me know if you find something suspicious.