Viewing a single comment thread. View all comments

dasayan05 t1_iqp8hnf wrote

possible. but what is the advantage with that ? even if we did find a way to explicitly noise the data/gradient, we are still better off with mini-batches as they offer less memory consumption

2

029187 OP t1_iqrinm2 wrote

If its only as good, then it has no benefit. But if it ends up being better, then it is useful for situations where we have enough memory.

​

https://arxiv.org/abs/2103.17182

​

This paper here is claiming they might have found interesting ways to potentially make it better.

1