jnez71 t1_ivp3sju wrote
Reply to comment by CPOOCPOS in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS
Hm, there may be a way to exploit that cheapened average gradient computation to still tell you curvature, which can help a lot.
I am reminded of how a covariance matrix is really just composed of means: cov[g,g] = E[gg'] - E[g]E[g']
(where '
is transpose). If g
is distributed as the gradients in your volume, I suspect that cov[g,g]
is related to the Hessian, and you can get that covariance with basically just averages of g
.
More intuitively I'm thinking, "in this volume, how much on average does the gradient differ from the average gradient." If your quantum computer really makes that volume averaging trivial, then I suspect someone would have come up with this as some kind of "quantum Newton's method."
I think that's all I got for ya. Good luck!
CPOOCPOS OP t1_ivp4z7j wrote
Thanks!! Wish you a good day!!
jnez71 t1_ivp6ril wrote
Oh I should add that from a nonconvex optimization perspective, the volume-averaging could provide heuristic benefits akin to GD+momentum type optimizers. (Edited my first comment to reflect this).
Try playing around with your idea in low dimensions on a classical computer to get a feel for it first. Might help you think of new ways to research it.
Viewing a single comment thread. View all comments