CPOOCPOS

CPOOCPOS OP t1_ivpmo8h wrote

>divergence

Hi bloc!! thanks for your answer

By taking the laplacian, you mean taking the laplacian ( Nabla * Nabla * f) of all points and average? Yes this is also possible. Not in a single Go, but i can get the second derivative of all points for each parameter and add them up. How would that help? Or what is a higher order optimisation

1

CPOOCPOS OP t1_ivp0tap wrote

thanks for your reply jnez! Yes, i have also had the thought actually of using the average of many local points to estimate the local curvature like needed in the BFGS.

You are right by saying that in a classical sense there are far better things to do with many adjacent gradient computations. But here I am doing machine learning on a quantum computer, and the interesting part is, that it is very cheap to calculate the average (and only the average) of many points. To be more concrete about the computational cost, it only takes linear effort to compute the average of an exponential amount of points.

As a start, when i was developing the idea, i just thought of the procedure as just being a vote on a bunch of local points on which direction they would like to go. But now I am looking for more concrete theoretical arguments on why it would make sense to take the average gradient (since on a quantum computer i wouldn't have this computational overhead like on a classical computer)

1

CPOOCPOS OP t1_ivovm1t wrote

Hi and thanks for your reply! I just looked into smoothing and it seems to be a kind of data manipulations. As in, the data we have is smoothend to find trends.

Here I don't have data actually, what I am averaging over is the volume of the parameter space, where the parameters are the learnable parameters of my network.
In other words when i try to update my parameters with GD I would like to average the gradients of all points ( in the parameter space) lying closely to my center point (or the point i would take the gradient of usually

0