I don't quite get the idea behind training on a smaller subset of the data, although it might be just that it doesn't work in my case.
In my specific case I tried training an ASR model on Librispeech.
Training it on 1/10th of the Librispeech 360h data gave me pretty much the exact same loss curve in the first hours of training. No better HP setting that I could have seen earlier.
It does more epochs in that time, yes, but to see a real difference between the curves of two HP settings it took basically the same time.
HennesTD t1_ir1b0gg wrote
Reply to [D] How do you go about hyperparameter tuning when network takes a long time to train? by twocupv60
I don't quite get the idea behind training on a smaller subset of the data, although it might be just that it doesn't work in my case.
In my specific case I tried training an ASR model on Librispeech. Training it on 1/10th of the Librispeech 360h data gave me pretty much the exact same loss curve in the first hours of training. No better HP setting that I could have seen earlier. It does more epochs in that time, yes, but to see a real difference between the curves of two HP settings it took basically the same time.