Acceptable-Cress-374 t1_iyjflu4 wrote on December 1, 2022 at 9:07 PM

I've been meaning to play around with whisper, but never got the time. Does it do any kind of voice / person segmentation as well? Can it tell speakers apart, say in a high quality input such as a podcast?

forfooinbar t1_iyjzzry wrote on December 1, 2022 at 11:27 PM

Whisper doesn't do speaker diarization AFAIK. It will just be one big blob of text.

The_frozen_one t1_iykajiv wrote on December 2, 2022 at 12:46 AM

You can play around with it here: https://whisper.ggerganov.com/

It's significantly slower (approx 50 times slower) than the natively compiled version (https://github.com/ggerganov/whisper.cpp) but you can at least get a sense of accuracy using the online version.

t0mkaka OP t1_iykoh4k wrote on December 2, 2022 at 2:35 AM

Yes there is no speaker diarization. That will solve problems in this model also and will make search better.