Viewing a single comment thread. View all comments

suflaj t1_j5r5bfw wrote

For learning rate you should just use a good starting point based on the batch size and architecture and relegate everything else to the scheduler and optimizer. I don't think there's any point messing with the learning rate once you find one that doesn't blow up your model, just use warmup or plateau schedulers to manage it for you after that.

Since you mentioned Inception I believe that unless you are using quite big batch sizes, your starting LR should be the magical 3e-4 for Adam or 1e-2 for SGD, and you would just use a ReduceOnPlateau scheduler with ex. patience of 3 epochs, cooldown of 2, factor of 0.1 and probably employ EarlyStopping if metric doesn't improve after 6 epochs.

2