Viewing a single comment thread. View all comments

uncooked-cookie t1_ivpnon9 wrote

The gradient doesn’t give you the optimal improvement direction, it gives you a local improvement direction.

30

make3333 t1_ivqfpxu wrote

first degree optimal direction

17

Difficult_Ferret2838 t1_ivrnegq wrote

That doesn't mean anything.

−14

make3333 t1_ivroe1x wrote

gradient descent takes the direction of the minimum at the step size according to the taylor series of degree n at that point. in neural nets we do first degree, as if it was a plane. in a lot of other optimization settings they do second order approx to find the optimal direction

10

Difficult_Ferret2838 t1_ivrom17 wrote

>gradient descent takes the direction of the minimum at the step size according to the taylor series of degree n at that point.

No. Gradient descent is first order by definition.

>in a lot of other optimization settings they do second order approx to find the optimal direction

It still isn't an "optimal" direction.

−3

bluuerp t1_ivpshwu wrote

Yes I meant the optimal improvement direction for that point.

4

Spiritual-Reply5896 t1_iw8yhoi wrote

It gives you local improvement direction, but can we straightforwardly think about this metaphora of improvement in 3D and generalize it to thousands of dimensions?

Maybe its a little different question, but do you happen to know where to find research on this topic of generalizability of mathematical operations in interpretable geometrical dimensions to extremely high dimensions? Not looking for theory on vector spaces but on the intuitive aspects

1