CPOOCPOS OP t1_ivpmo8h wrote on November 9, 2022 at 6:13 PM

Reply to comment by bloc97 in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS

>divergence

Hi bloc!! thanks for your answer

By taking the laplacian, you mean taking the laplacian ( Nabla * Nabla * f) of all points and average? Yes this is also possible. Not in a single Go, but i can get the second derivative of all points for each parameter and add them up. How would that help? Or what is a higher order optimisation

CPOOCPOS OP t1_ivp4z7j wrote on November 9, 2022 at 4:18 PM

Reply to comment by jnez71 in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS

Thanks!! Wish you a good day!!

CPOOCPOS OP t1_ivp0tap wrote on November 9, 2022 at 3:50 PM

Reply to comment by jnez71 in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS

thanks for your reply jnez! Yes, i have also had the thought actually of using the average of many local points to estimate the local curvature like needed in the BFGS.

You are right by saying that in a classical sense there are far better things to do with many adjacent gradient computations. But here I am doing machine learning on a quantum computer, and the interesting part is, that it is very cheap to calculate the average (and only the average) of many points. To be more concrete about the computational cost, it only takes linear effort to compute the average of an exponential amount of points.

As a start, when i was developing the idea, i just thought of the procedure as just being a vote on a bunch of local points on which direction they would like to go. But now I am looking for more concrete theoretical arguments on why it would make sense to take the average gradient (since on a quantum computer i wouldn't have this computational overhead like on a classical computer)

CPOOCPOS OP t1_ivowysr wrote on November 9, 2022 at 3:25 PM

Reply to comment by bluuerp in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS

This sounds similar to what fredditor_1 was explaining. I will look into it!

Thanks a lot

CPOOCPOS OP t1_ivowh92 wrote on November 9, 2022 at 3:21 PM

Reply to comment by [deleted] in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS

ohh okayy! that actually sounds exciting! Thanks a lot

CPOOCPOS OP t1_ivovm1t wrote on November 9, 2022 at 3:16 PM

Reply to comment by [deleted] in [D] Is there an advantage in learning when taking the average Gradient compared to the Gradient of just one point by CPOOCPOS

Hi and thanks for your reply! I just looked into smoothing and it seems to be a kind of data manipulations. As in, the data we have is smoothend to find trends.

Here I don't have data actually, what I am averaging over is the volume of the parameter space, where the parameters are the learnable parameters of my network.
In other words when i try to update my parameters with GD I would like to average the gradients of all points ( in the parameter space) lying closely to my center point (or the point i would take the gradient of usually