Submitted by pommedeterresautee t3_10xp54e in MachineLearning
whata_wonderful_day t1_j7ubutx wrote
Reply to comment by blackkettle in [P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl by pommedeterresautee
His point is that it's identical. They didn't use quantization or anything that would hurt performance. The whisper paper has a lot of the details you're asking for
blackkettle t1_j7ud34i wrote
Are you talking about this paper:
- https://cdn.openai.com/papers/whisper.pdf
maybe I missed it but I can't find any place in that paper where they talk about the trade-offs with respect to real time factor and decoding strategies. RTF vs acc curves for CPU vs GPU for STT typically vary not in terms of absolute performance but in terms of where along the RTF curve you achieve a particular accuracy. That impacts what kinds of tasks you can expect to use the model for, and how you can expect to scale it to real world applications. So far this has been the weakest point for all the Whisper related work (still better off with espnet, k2, speechbrain, etc). This information would be interesting to see if they have it.
Viewing a single comment thread. View all comments