shadow_fax1024
shadow_fax1024 t1_j1sdf0y wrote
You could also look into different approaches taken by participants in kaggle competition: birdclef..here the problem is somewhat similar to yours
shadow_fax1024 t1_j1scqqd wrote
Reply to comment by Helveticus99 in [D] Classification task based on speech recordings by Helveticus99
You could split the file into chunk of n seconds ..n seconds you need to find ..which ever fits for your dataset..for mine 4 sec chunk was good enough...also you could use a peak detector first and then chunk the file n/2 seconds either side from the peak and have some overlapping window there too..so that you won't loose information ..
shadow_fax1024 t1_j1pxmnn wrote
I used plain cnn with and without attention ..I had to handle long audio files in training as well as inference
shadow_fax1024 t1_j1oc69w wrote
1 has worked for me in the past ..you may need to generate more samples using various techniques like add a small noise in audio then taking its spectrogram,cut - mix a random portion of spectrograms, etc..
shadow_fax1024 t1_j1so9tm wrote
Reply to comment by shadow_fax1024 in [D] Classification task based on speech recordings by Helveticus99
https://www.kaggle.com/c/birdclef-2021