ia3leonid t1_j7hgcoq wrote on February 6, 2023 at 8:46 PM Reply to Why does my Transformer blow GPU memory? by beautyofdeduction Gradients are also stored and take as much memory as weights + activations, or more for some optimisers (Adam also tracks statistics for each weight, for example ) Permalink 1
ia3leonid t1_j7hgcoq wrote
Reply to Why does my Transformer blow GPU memory? by beautyofdeduction
Gradients are also stored and take as much memory as weights + activations, or more for some optimisers (Adam also tracks statistics for each weight, for example )