029187 OP t1_iqp6s79 wrote on October 2, 2022 at 2:00 AM

Reply to comment by dasayan05 in [Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam? by 029187

what if, as another poster said, we did full batch but also injected noise into it?

dasayan05 t1_iqp8hnf wrote on October 2, 2022 at 2:14 AM

possible. but what is the advantage with that ? even if we did find a way to explicitly noise the data/gradient, we are still better off with mini-batches as they offer less memory consumption

029187 OP t1_iqrinm2 wrote on October 2, 2022 at 4:02 PM

If its only as good, then it has no benefit. But if it ends up being better, then it is useful for situations where we have enough memory.

https://arxiv.org/abs/2103.17182

This paper here is claiming they might have found interesting ways to potentially make it better.

Red-Portal t1_iqpczjb wrote on October 2, 2022 at 2:52 AM

People have tried it, and so far no one has been able to achieve the same effect. It's still somewhat of an open research problem.

029187 OP t1_iqpigzv wrote on October 2, 2022 at 3:42 AM

ah cool! do you have any links to papers on the topic? i'd love to read them!

Red-Portal t1_iqpipq6 wrote on October 2, 2022 at 3:45 AM

I think it was this one: https://arxiv.org/abs/2103.17182

029187 OP t1_iqrihvc wrote on October 2, 2022 at 4:01 PM

thanks!!