Submitted by twocupv60 t3_xvem36 in MachineLearning
neato5000 t1_ir0rkr8 wrote
You do not need to train to completion to be able to discard hyperparameter settings that will not perform well. In general early relative performance is a good predictor of final performance, so if within the early stages of training a certain hp vector is performing worse than its peers kill it, and start training with the next hp vector.
This is roughly the logic behind population based training
suflaj t1_ir1fjgt wrote
This is not in practice true for modern DL models, especially those trained with modern optimization methods, like Adam(W). Adam(W) can have optimal performance at the start but then it's anyone's game till the end of the training.
In other words, not only will the optimal hyperparameters probably be different, because you need to change to SGD to reach max performance, you will have to retune the hyperparameters you already accepted as optimal. Successful early training only somewhat guarantees you won't diverge, but to end up with the best final weights you'll have to do additional hyperparameters search (and there is no guarantee your early training checkpoint will lead you to the best weights in the end either).
red_dragon t1_ir3t4b6 wrote
I'm running into this problem with Adam(W). Are there any suggestions on how to avoid this. Many of my experiments start off better than baseline, but ultimately do worse.
suflaj t1_ir4ow8t wrote
Switch to SGD after 1 epoch or so
But if they do worse than the baseline something else is likely the problem. Adam(W) does not kill performance, it just for some reason isn't as effective as reaching the best final performance as simpler optimizers.
ginsunuva t1_ir4h3p4 wrote
A higher LR gonna have better initial performance usually
SatoshiNotMe t1_ir4wbon wrote
Technically, what you’re talking about is early stopping of “trials” in HP tuning. PBT is different — that involves changing the hyperparameter during training. And yes you can use PBT in tuning.
Viewing a single comment thread. View all comments