Comments

You must log in or register to comment.

dfcHeadChair t1_jbau8dy wrote

If you’re only detecting speech, that is doable with heuristics and some napkin math, or an MLP, for simple cases. However, “detect speech in this audio” is rarely the end of the story in the real world. Next up comes transcription, sentiment analysis, tonal feature flagging, etc. all of which are currently dominated by Transformers. You’ll also see some great work in the RNN space, but Transformer-based architectures are king right now.

Some models for inspiration, https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=downloads

2

alexilas OP t1_jbazarg wrote

Thanks!! I really appreciate. I really like the ai world and if it's not too much to ask, if you have anything else you would recommend me to go further I would appreciate it. Again thanks!!

2

MatureKit t1_jbb8bgo wrote

Check out the WaveNet paper for some ideas about this!

2

incrapnito t1_jca4u2d wrote

Use scikit learn mlp classifier if you have to use mlp.

2