Submitted by Helveticus99 t3_zvb6l5 in MachineLearning
shadow_fax1024 t1_j1pxmnn wrote
I used plain cnn with and without attention ..I had to handle long audio files in training as well as inference
Helveticus99 OP t1_j1qjdxl wrote
Thank you u/shadow_fax1024. How did you handle audio files with different length? And how did you handle the long audio files exactly? I think creating a Mel-Spectrograms over long audio files won't work.
shadow_fax1024 t1_j1scqqd wrote
You could split the file into chunk of n seconds ..n seconds you need to find ..which ever fits for your dataset..for mine 4 sec chunk was good enough...also you could use a peak detector first and then chunk the file n/2 seconds either side from the peak and have some overlapping window there too..so that you won't loose information ..
Viewing a single comment thread. View all comments