fasttosmile t1_iqrolwa wrote
Reply to comment by ClearlyCylindrical in [Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? by 029187
There was for a while the belief that the stochasticity was key for good performance (one paper supporting the hypothesis from 2016). Your framing makes it sound like that is still the case - you suggest no other reason for not doing full batch descent - and I think it's important to point out it's not.
Viewing a single comment thread. View all comments