bklawa t1_ir0wlyp wrote
Some ideas:
-
Down sample the audio to lower sample rate (if it is 48Khz, perhaps try 8Khz). This really depends on the task (music, speech, other general audio recordings...).
-
You don't need to feed the whole spectrogram of 30 min to the model for classification. A alternative would be to reduce the time axis by applying the mean or max for example, at the end you will end up with a very small vector. Otherwise you can also do it over splits of 1 mins segments to try keeping more information. But this will definitely help reducing the model size.
-
You can clip the portions of the audio track that are "silent" or under a certain energy threshold before applying the steps above.
Hope this helps
time_waster103 OP t1_ir0xlzf wrote
Thanks for the ideas
Viewing a single comment thread. View all comments