Submitted by twocupv60 t3_xvem36 in MachineLearning
red_dragon t1_ir3t4b6 wrote
Reply to comment by suflaj in [D] How do you go about hyperparameter tuning when network takes a long time to train? by twocupv60
I'm running into this problem with Adam(W). Are there any suggestions on how to avoid this. Many of my experiments start off better than baseline, but ultimately do worse.
suflaj t1_ir4ow8t wrote
Switch to SGD after 1 epoch or so
But if they do worse than the baseline something else is likely the problem. Adam(W) does not kill performance, it just for some reason isn't as effective as reaching the best final performance as simpler optimizers.
Viewing a single comment thread. View all comments