programmerChilli t1_j7toust wrote on February 9, 2023 at 9:25 AM

Reply to comment by pommedeterresautee in [P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl by pommedeterresautee

> The Python layer brings most of the PyTorch latency.

This actually isn't true - I believe most of the per-operator latency come from C++.

pommedeterresautee OP t1_j7tp663 wrote on February 9, 2023 at 9:30 AM

I guess you better know than me :-)

Which part? The dispatcher thing or it's spread on several steps?

programmerChilli t1_j7tpwd7 wrote on February 9, 2023 at 9:40 AM

Lots of things. You can see a flamegraph here: https://horace.io/img/perf_intro/flamegraph.png (taken from https://horace.io/brrr_intro.html).

Dispatcher is about 1us, but there's a lot of other things that need to go on - inferring dtype, error checking, building the op, allocating output tensors, etc.