Audio files I download from the links in the RSS feed. Then I am generating the transcripts using whisper. Not always great but it works most of the time.
Yes, It's on GPU. I used tiny and medium. I haven't tried large because I wanted to run fast. I tried for 3-4 days to parallelize and was inpired by your post also and one by Assembly who demoed with parallelized.
But unfortunately, I was not able to parallelize. Whisper uses 30 seconds clips and then for the next 30 seconds it passes the last 30 seconds text as prompt. Since podcasts are not cut out in 30 seconds so I needed to enter the prompt in anycase. I cannot transcribe them independently.
I deploy on vast.ai for cheap GPU usage in a day and run 2 models parallely. The GPU memory usage is low around 30% but the GPU CPU usage goes to full and then speed begins to fall after 2 parallel models. So I run 2 inference runs per GPU. I have used only 1 GPU at the moment and not scaled it but it should not be tough task now.
t0mkaka OP t1_iypil3i wrote
Reply to comment by thundergolfer in [Project] I used whisper to transcribe 2500 episodes from around 80 podcasts and made it searchable. by t0mkaka
https://www.assemblyai.com/blog/how-to-run-openais-whisper-speech-recognition-model/
​
Here. Search for same link on reddit r/MachineLearning you fill find the original post also.