t0mkaka
t0mkaka OP t1_iypifl5 wrote
Reply to comment by TheMrZZ0 in [Project] I used whisper to transcribe 2500 episodes from around 80 podcasts and made it searchable. by t0mkaka
Audio files I download from the links in the RSS feed. Then I am generating the transcripts using whisper. Not always great but it works most of the time.
t0mkaka OP t1_iykoh4k wrote
Reply to comment by Acceptable-Cress-374 in [Project] I used whisper to transcribe 2500 episodes from around 80 podcasts and made it searchable. by t0mkaka
Yes there is no speaker diarization. That will solve problems in this model also and will make search better.
t0mkaka OP t1_iyko74k wrote
Reply to comment by thundergolfer in [Project] I used whisper to transcribe 2500 episodes from around 80 podcasts and made it searchable. by t0mkaka
Yes, It's on GPU. I used tiny and medium. I haven't tried large because I wanted to run fast. I tried for 3-4 days to parallelize and was inpired by your post also and one by Assembly who demoed with parallelized.
But unfortunately, I was not able to parallelize. Whisper uses 30 seconds clips and then for the next 30 seconds it passes the last 30 seconds text as prompt. Since podcasts are not cut out in 30 seconds so I needed to enter the prompt in anycase. I cannot transcribe them independently.
I deploy on vast.ai for cheap GPU usage in a day and run 2 models parallely. The GPU memory usage is low around 30% but the GPU CPU usage goes to full and then speed begins to fall after 2 parallel models. So I run 2 inference runs per GPU. I have used only 1 GPU at the moment and not scaled it but it should not be tough task now.
t0mkaka OP t1_iypil3i wrote
Reply to comment by thundergolfer in [Project] I used whisper to transcribe 2500 episodes from around 80 podcasts and made it searchable. by t0mkaka
https://www.assemblyai.com/blog/how-to-run-openais-whisper-speech-recognition-model/
​
Here. Search for same link on reddit r/MachineLearning you fill find the original post also.