ggerganov
ggerganov OP t1_is1tgok wrote
Reply to comment by mrpogiface in [P] Pure C/C++ port of OpenAI's Whisper by ggerganov
Looks like WASM actually support SIMD:
https://emscripten.org/docs/porting/simd.html
Will definitely give this a try when I get some free time. I will post updates here, if you are interested in the progress:
ggerganov OP t1_irwlluz wrote
Reply to comment by mrpogiface in [P] Pure C/C++ port of OpenAI's Whisper by ggerganov
I was thinking about this too.
Compiling the code is easy. The problem is you need to load 75 MB model data (this is the "tiny" model). I guess nobody would want to download 75 MB every time they load a page.
Even if we say you are OK with 75 MB assets, the next problem is WASM not supporting SIMD. So the performance would be much worse compared to native. How much? Not sure.
But nevertheless - it might be fun to try and run it in the browser.
ggerganov OP t1_irw8n49 wrote
Reply to comment by CommunismDoesntWork in [P] Pure C/C++ port of OpenAI's Whisper by ggerganov
Someone already provided Rust bindings to the C-style API:
ggerganov OP t1_irw8eho wrote
Reply to comment by ThisIsMyStonerAcount in [P] Pure C/C++ port of OpenAI's Whisper by ggerganov
Essentially, it's the mat mul routine that I have re-implemented. It consumes more than 90% of the computation.
I tried using built-in BLAS implementation that comes from Apple Accelerate framework. My F16 mat mul performed better compared to cblas_sgemm and the Accelerate framework didn't provide F16 overloads.
I didn't wan't to include external BLAS implementations, because I wanted to have inference implementation that does not depend on anything and you can easily build and try.
Also, a major factor was that this entire project is mostly a learning experience to understand how the transformers work on a lower level and improve my C programming and opitmization skills.
One thing I noticed is that the 32FP mat mul from Torch outperforms my F16 mat mul on M1 for big matrices (> 1024x1024). It seems that it uses MKL under the hood. For bigger sizes, it can be up to 3 times faster. It would be interesting to explore how this can be achieved manually.
ggerganov OP t1_irw7dy6 wrote
Reply to comment by MidnightSun_55 in [P] Pure C/C++ port of OpenAI's Whisper by ggerganov
No. I tried using Metal Performance Shaders (MPS) but was not able to utilize it properly. Here are some notes on this:
https://github.com/ggerganov/ggml/tree/master/examples/gpt-j#attempt-to-use-the-m1-gpu
ggerganov OP t1_irv1b8s wrote
Reply to comment by Fit_Schedule5951 in [P] Pure C/C++ port of OpenAI's Whisper by ggerganov
Here is a comparison for Intel CPU:
https://github.com/ggerganov/whisper.cpp/issues/2#issuecomment-1257808576
Would be interesting to compare it on M1 when torch starts supporting F16.
ggerganov OP t1_irv0mle wrote
Reply to comment by justgord in [P] Pure C/C++ port of OpenAI's Whisper by ggerganov
Hi, yes - I'm using SIMD intrinsics. AVX2 on x86 and NEON on ARM.
I am taking advantage of F16 floating-point arithmetic if available. Otherwise, I use it just as storage type to reduce memory bandwidth.
ggerganov OP t1_irv0gki wrote
Reply to comment by upperfloormaster in [P] Pure C/C++ port of OpenAI's Whisper by ggerganov
Here are some benchmarks that other people did (both vs CPU and vs GPU):
- vs OpenVINO + ONNX on CPU - more than 2x faster
https://github.com/openai/whisper/discussions/208#discussioncomment-3827022
- vs PyTorch (CPU: i7 11800H, GPU: RTX 3080 Laptop):
https://github.com/ggerganov/whisper.cpp/issues/2#issuecomment-1257808576
- whisper.cpp on Xeon processor
https://github.com/ggerganov/whisper.cpp/issues/16
Also, my implementation is focused for performance on M1 chips and it looks like most of the Python frameworks do not support it properly yet, so I cannot make a proper benchmark.
Additionally, my implementation can also run the "large" model on an android phone (Samsung A52) - would be interesting to see how this compares with existing implementations:
https://github.com/ggerganov/whisper.cpp/issues/18#issue-1395784900
Submitted by ggerganov t3_y0nvqu in MachineLearning
ggerganov OP t1_itcgtpx wrote
Reply to comment by mrpogiface in [P] Pure C/C++ port of OpenAI's Whisper by ggerganov
Hey, just in case you are still interested - today I finished the WASM port and the performance not really bad:
https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.wasm
There is a link to a live demo page where you can play with it.
Cheers!