[Project] I used whisper to transcribe 2500 episodes from around 80 podcasts and made it searchable.
Submitted by t0mkaka t3_z9ps9k in MachineLearning
Hi all,
This is similar to some other posts about doing podcast transcribing episodes.
I used whisper models to downloade and transcribe them and then made them in to Full text searchable.
The architecture is simple from RSS -> Download -> Transcribe -> Segment -> Ingest to DB for search.
For the fully available transcript, I also use auto highlighting to highlight important segments of podcast using Wink NLP.
​
here is the URl : https://www.castdop.com
​
I can add around 1400 hours of content per day.
Any feedback / comment /questions is appreciated.
P.S. : let me know if this violates some rules, I just posted because I saw similar posts before.
Acceptable-Cress-374 t1_iyjflu4 wrote
I've been meaning to play around with whisper, but never got the time. Does it do any kind of voice / person segmentation as well? Can it tell speakers apart, say in a high quality input such as a podcast?