Submitted by time_waster103 t3_xvh2ep in MachineLearning
I have a dataset where each audio file is around 30 minutes long. I need to classify the audio files into 6 categories and the inference time needs to be fast - not more than 1 second. Most of the audio classification techniques that I have come across use MFCC or Mel Spectrograms. Producing an MFCC or Mel Spectrogram for the entire 30 minutes is time consuming. So I am suspecting I have to classify the audio file based on short clips extracted from the file. Now, the success of the classification task would depend on how representative the short clips are of the original audio file. Maybe the short clips can be extracted based on audio features that aren't too expensive to compute - RMS for example. But I'm not aware of any existing work that has been in this field. A quick Google search and scanning of Google Scholar didn't give me anything useful. So it would greatly benefit me if someone could point me towards any existing work done in this field.
aman5319 t1_ir0v0em wrote
I would also like to know answer to this question.