hughperman t1_ivqd886 wrote
Reply to comment by bluuerp in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS
Consider though, in a linear scheme, taking each gradient step separately is equal the sum of the gradients. Taking the average is equal to the sum of the gradients divided by the number of steps. So you are only adjusting the step by a scale factor of 1/N, nothing more mathemagical.
Viewing a single comment thread. View all comments