[Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? Submitted by 029187 t3_xt0h2k on October 1, 2022 at 5:01 PM in MachineLearning 20 comments 3
gdahl t1_iqpf8j8 wrote on October 2, 2022 at 3:12 AM Adam is more likely to outperform steepest descent (full batch GD) in the full batch setting than it is to outperform SGD at batch size 1. Permalink 2
Viewing a single comment thread. View all comments