Submitted by jarekduda t3_zb7xjb in MachineLearning
Red-Portal t1_iyprcgo wrote
Isn't what you implemented more or less a variant of BFGS? Stochastic BFGS is very well known to not work very well on deep neural networks.
jarekduda OP t1_iypsbet wrote
Indeed BFGS seems the closest to my approach (OGR), but it is relatively costly: needs many matrix products per step, uses only a few gradients per step, and they have the same weights.
In contrast, OGR is literally online linear regression of gradients, per step updates 4 averages and e.g. use eigendecompositon (can be done cheaper), uses exponentially weakening weights - focusing on local situation, using all previous gradients ... also should be compatible with slow evolution of considered local interesting subspace.
Viewing a single comment thread. View all comments