SNAPscientist t1_iqr3sej wrote
Reply to comment by Ephemeral_Epoch in [Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? by 029187
Capturing the distribution characteristics of high-dimensional data is very hard. In fact if we could do that well, we might be able to use classic bayesian techniques for many NN problems which would be more principled and interpretable. Any noise one would end up adding by hand is unlikely to introduce the kind of stochasticity that sampling on real data (using minibatches or similar procedures) would. Getting the distribution wrong would likely mean poor generalization.
Viewing a single comment thread. View all comments