HennesTD t1_ir1b0gg wrote on October 4, 2022 at 4:43 PM

I don't quite get the idea behind training on a smaller subset of the data, although it might be just that it doesn't work in my case.

In my specific case I tried training an ASR model on Librispeech. Training it on 1/10th of the Librispeech 360h data gave me pretty much the exact same loss curve in the first hours of training. No better HP setting that I could have seen earlier. It does more epochs in that time, yes, but to see a real difference between the curves of two HP settings it took basically the same time.