shadow_fax1024 t1_j1so9tm wrote on December 27, 2022 at 2:16 AM

Reply to comment by shadow_fax1024 in [D] Classification task based on speech recordings by Helveticus99

https://www.kaggle.com/c/birdclef-2021

shadow_fax1024 t1_j1sdf0y wrote on December 27, 2022 at 12:48 AM

Reply to [D] Classification task based on speech recordings by Helveticus99

You could also look into different approaches taken by participants in kaggle competition: birdclef..here the problem is somewhat similar to yours

shadow_fax1024 t1_j1scqqd wrote on December 27, 2022 at 12:43 AM

Reply to comment by Helveticus99 in [D] Classification task based on speech recordings by Helveticus99

You could split the file into chunk of n seconds ..n seconds you need to find ..which ever fits for your dataset..for mine 4 sec chunk was good enough...also you could use a peak detector first and then chunk the file n/2 seconds either side from the peak and have some overlapping window there too..so that you won't loose information ..

shadow_fax1024 t1_j1pxmnn wrote on December 26, 2022 at 1:29 PM

Reply to [D] Classification task based on speech recordings by Helveticus99

I used plain cnn with and without attention ..I had to handle long audio files in training as well as inference

shadow_fax1024 t1_j1oc69w wrote on December 26, 2022 at 1:50 AM

Reply to [D] Classification task based on speech recordings by Helveticus99

1 has worked for me in the past ..you may need to generate more samples using various techniques like add a small noise in audio then taking its spectrogram,cut - mix a random portion of spectrograms, etc..