Submitted by ggerganov t3_y0nvqu in MachineLearning
Recently, I am having fun with re-implementing the inference of various transformer models (GPT-2, GPT-J) in pure C/C++ in order to efficiently run them on a CPU.
The latest one that I ported is OpenAI Whisper for automatic speech recognition:
https://github.com/ggerganov/whisper.cpp
For smaller models I am able to achieve very nice performance.
For example, here is a demonstration of real-time transcription of audio from the microphone:
whisper.cpp running on a MacBook Pro M1 (CPU only)
Hope you find this project interesting and let me know if you have any questions about the implementation.
LetterRip t1_irt4luw wrote
You might check DeepSpeed MII, Facebook AITemplate, and Google XNNPACK and see how their CPU conversions compare.
https://github.com/facebookincubator/AITemplate
https://github.com/microsoft/DeepSpeed-MII
https://github.com/google/XNNPACK
and see how those compare,
also