jarekduda OP t1_iyq9dj9 wrote on December 3, 2022 at 8:39 AM

Reply to comment by SufficientStautistic in [R] SGD augmented with 2nd order information from seen sequence of gradients? by jarekduda

It is regularized Gauss-Newton, which is generally quite suspicious: approximates Hessian with positive defined ... for extremely non-convex function.

How does it change the landscape of extrema?

Is it used for NN training? K-FAC uses kind of related Fisher information approximation to positive defined.

serge_cell t1_iyv2zag wrote on December 4, 2022 at 11:36 AM

3D Localization/Registration/Reconstruction are traditional area of use for regularized Gauss-Newton and all are highly non-convex. The trick is to strat in nearly-convex area, sometimes after several tries, and/or convexify with regularizers and/or sensors fusion.

K-FAC seems stable enough but quite complex in implementation. It's identical to low-dimentional-blocks approximation of Gauss-Newton. Fisher information is only decoration.