t0mkaka OP t1_iyko74k wrote on December 2, 2022 at 2:33 AM

Reply to comment by thundergolfer in [Project] I used whisper to transcribe 2500 episodes from around 80 podcasts and made it searchable. by t0mkaka

Yes, It's on GPU. I used tiny and medium. I haven't tried large because I wanted to run fast. I tried for 3-4 days to parallelize and was inpired by your post also and one by Assembly who demoed with parallelized.

But unfortunately, I was not able to parallelize. Whisper uses 30 seconds clips and then for the next 30 seconds it passes the last 30 seconds text as prompt. Since podcasts are not cut out in 30 seconds so I needed to enter the prompt in anycase. I cannot transcribe them independently.

I deploy on vast.ai for cheap GPU usage in a day and run 2 models parallely. The GPU memory usage is low around 30% but the GPU CPU usage goes to full and then speed begins to fall after 2 parallel models. So I run 2 inference runs per GPU. I have used only 1 GPU at the moment and not scaled it but it should not be tough task now.

thundergolfer t1_iyocw2v wrote on December 2, 2022 at 10:04 PM

Thanks for the details.

> one by Assembly who demoed with parallelized.

What was this demo? Got a link?

t0mkaka OP t1_iypil3i wrote on December 3, 2022 at 3:36 AM

https://www.assemblyai.com/blog/how-to-run-openais-whisper-speech-recognition-model/

Here. Search for same link on reddit r/MachineLearning you fill find the original post also.