Viewing a single comment thread. View all comments

Tart_Beginning t1_j3zub5s wrote

Is it true that learning rate matters less if you’re using an adaptive optimizer? If so, would you argue for or against using learning rate decay, and why?

1