blackkettle
blackkettle t1_jace0gh wrote
blackkettle t1_j7ud34i wrote
Reply to comment by whata_wonderful_day in [P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl by pommedeterresautee
Are you talking about this paper:
- https://cdn.openai.com/papers/whisper.pdf
maybe I missed it but I can't find any place in that paper where they talk about the trade-offs with respect to real time factor and decoding strategies. RTF vs acc curves for CPU vs GPU for STT typically vary not in terms of absolute performance but in terms of where along the RTF curve you achieve a particular accuracy. That impacts what kinds of tasks you can expect to use the model for, and how you can expect to scale it to real world applications. So far this has been the weakest point for all the Whisper related work (still better off with espnet, k2, speechbrain, etc). This information would be interesting to see if they have it.
blackkettle t1_j7u2kd0 wrote
Reply to comment by pommedeterresautee in [P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl by pommedeterresautee
Probably my question was not well-formulated. I'm just curious about what the RTF vs Accuracy tradeoff looks like. I'm not questioning whether it works, I'm just curious what the actual performance looks like.
You report on memory usage and beam sizes, as well as relative speedup, but it would be interesting to also see WER performance, as well as the actual absolute RTFs.
blackkettle t1_j7tyq1r wrote
This is very interesting, thanks for sharing! Do you have any more detail on RTF vs Accuracy curves? Also did you run this on any other data sets? Librispeech - even the “other” pieces is very clean, simple data from an acoustic and linguistic standpoint.
It would be really interesting to see how well this holds on noisy spontaneous speech like conversations.
blackkettle t1_j6mnmbz wrote
Reply to comment by FobbingMobius in OpenAI executives say releasing ChatGPT for public use was a last resort after running into multiple hurdles — and they're shocked by its popularity by steviaplath153
I’ll be surprised if the status quo holds once doctors and lawyers start losing their jobs. I think that will be the tipping point. That group still holds some wealth and political power.
blackkettle t1_j6mk578 wrote
Reply to comment by pronyo001 in OpenAI executives say releasing ChatGPT for public use was a last resort after running into multiple hurdles — and they're shocked by its popularity by steviaplath153
I work in R&D in this space. There is a cost associated with training and running inference on these things. With data curation, and with the human resource funding for research. But the latter is also funded in large part by the public.
The data itself is entirely produced by the collective output of humanity. In the next 5-10 years these tools will begin to eliminate white collar professional jobs - it will happen. And as it does, dealing with that at a societal level will become a matter of great import.
Recognizing our collective contribution and actively directing these achievements towards a better shared future - sharing the benefits - will either make or break us IMO.
My 6 year old son will come of age in a radically different world. And I believe that we the creators have a responsibility to ensure that that world promotes better equity for all.
blackkettle t1_j6me60g wrote
Reply to comment by PRSHZ in OpenAI executives say releasing ChatGPT for public use was a last resort after running into multiple hurdles — and they're shocked by its popularity by steviaplath153
I think a more accurate analogy would you dig around the kids legos, build a cool car from it, share it with him, get feedback, take it back and then try to rent it to him for all future play…
blackkettle t1_j6lxn1y wrote
Reply to comment by pronunciaai in [D] What's stopping you from working on speech and voice? by jiamengial
I used to work in pronunciation modeling!
blackkettle t1_j6itdxo wrote
How familiar are you with the existing frameworks out there for this topic space? There's a lot of active work here; I'm curious about what you are focusing on, and how that reflects against the shortcomings of existing frameworks:
- https://github.com/kaldi-asr/kaldi
- https://github.com/espnet/espnet
- https://github.com/speechbrain/speechbrain
- https://github.com/NVIDIA/NeMo
- https://github.com/microsoft/UniSpeech
- https://github.com/topics/wav2vec2 [bajillions of similar]
- https://github.com/BUTSpeechFIT/VBx
this list is of course incomplete, but there is a _lot_ of active work in this space and a lot of opensource. Recently you've also got larger and larger public datasets becoming available. The SOTA is really getting close to commoditization as well.
What sort of OSS intersection or area are you focusing on, and why?
blackkettle t1_j4p8nqv wrote
Reply to [R] The Predictive Forward-Forward Algorithm by radi-cho
It doesn’t seem to discuss the computational advantages in any detail. How interesting is this whole FF idea at this point? I’d love to hear more detailed analysis.
So far it seems like an interesting alternative but the “brain inspired” part is pushed in every article. In terms of accuracy it always seems slightly below traditional back prop. If there’s a huge computational improvement that would seriously recommend it I guess, but is there? Or is it just too early to tell?
blackkettle t1_jdbsz0o wrote
Reply to comment by coreywindom in Overconsumption of water 'draining humanity's lifeblood,' UN chief says by No-Drawing-6975
Or: they clearly know how to do wildlife population management but they can’t seem to appreciate the fact that they are themselves just another species of the wildlife. Too many people.