Viewing a single comment thread. View all comments

seacucumber3000 t1_j0fem4u wrote on December 16, 2022 at 6:07 AM

When tuning hyperparameters, is learning rate (decay, scheduling, etc.) dependent on things like model size and activation function? Or can I search for the ideal model architecture first, then tune learning rate after?