Submitted by jarekduda t3_zb7xjb in MachineLearning
jarekduda OP t1_iyq9dj9 wrote
Reply to comment by SufficientStautistic in [R] SGD augmented with 2nd order information from seen sequence of gradients? by jarekduda
It is regularized Gauss-Newton, which is generally quite suspicious: approximates Hessian with positive defined ... for extremely non-convex function.
How does it change the landscape of extrema?
Is it used for NN training? K-FAC uses kind of related Fisher information approximation to positive defined.
serge_cell t1_iyv2zag wrote
3D Localization/Registration/Reconstruction are traditional area of use for regularized Gauss-Newton and all are highly non-convex. The trick is to strat in nearly-convex area, sometimes after several tries, and/or convexify with regularizers and/or sensors fusion.
K-FAC seems stable enough but quite complex in implementation. It's identical to low-dimentional-blocks approximation of Gauss-Newton. Fisher information is only decoration.
Viewing a single comment thread. View all comments