uncooked-cookie t1_ivpnon9 wrote
Reply to comment by bluuerp in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS
The gradient doesn’t give you the optimal improvement direction, it gives you a local improvement direction.
make3333 t1_ivqfpxu wrote
first degree optimal direction
Difficult_Ferret2838 t1_ivrnegq wrote
That doesn't mean anything.
make3333 t1_ivroe1x wrote
gradient descent takes the direction of the minimum at the step size according to the taylor series of degree n at that point. in neural nets we do first degree, as if it was a plane. in a lot of other optimization settings they do second order approx to find the optimal direction
Difficult_Ferret2838 t1_ivrom17 wrote
>gradient descent takes the direction of the minimum at the step size according to the taylor series of degree n at that point.
No. Gradient descent is first order by definition.
>in a lot of other optimization settings they do second order approx to find the optimal direction
It still isn't an "optimal" direction.
kksnicoh t1_ivtla47 wrote
It is optimal in first order :)
Difficult_Ferret2838 t1_ivtprrn wrote
Exactly, that is a meaningless phrase.
bluuerp t1_ivpshwu wrote
Yes I meant the optimal improvement direction for that point.
Spiritual-Reply5896 t1_iw8yhoi wrote
It gives you local improvement direction, but can we straightforwardly think about this metaphora of improvement in 3D and generalize it to thousands of dimensions?
Maybe its a little different question, but do you happen to know where to find research on this topic of generalizability of mathematical operations in interpretable geometrical dimensions to extremely high dimensions? Not looking for theory on vector spaces but on the intuitive aspects
Viewing a single comment thread. View all comments