Viewing a single comment thread. View all comments

seacucumber3000 t1_j0fem4u wrote

When tuning hyperparameters, is learning rate (decay, scheduling, etc.) dependent on things like model size and activation function? Or can I search for the ideal model architecture first, then tune learning rate after?

1