Submitted by twocupv60 t3_xvem36 in MachineLearning
HennesTD t1_ir1b0gg wrote
I don't quite get the idea behind training on a smaller subset of the data, although it might be just that it doesn't work in my case.
In my specific case I tried training an ASR model on Librispeech. Training it on 1/10th of the Librispeech 360h data gave me pretty much the exact same loss curve in the first hours of training. No better HP setting that I could have seen earlier. It does more epochs in that time, yes, but to see a real difference between the curves of two HP settings it took basically the same time.
Viewing a single comment thread. View all comments