Saw this tweet where it says that with some "quirky tricks" Nesterov can be obtained as a special case of PID control. I did a google search but it returned nothing of relevance.

Is this a popular result in optimisation I'm not aware of? Or have I just not looked hard enough? If someone can point me to relevant references, that'll be great.

Comments

You must log in or register to comment.

TheNovicePhilomath t1_j25wla1 wrote on December 29, 2022 at 10:00 PM

I don't think this is a standard result, or at least I haven't encountered it. After some digging, this paper seems to have a good explanation of the similarities between Nesterov and PID (section 3).

Also, the idea behind the linked paper in the twitter thread just blew my mind. So obvious, yet beautiful. A Kalman filter as an optimiser to estimate network parameters from noisy loss measurements. Great stuff.

cruddybanana1102 OP t1_j262t6l wrote on December 29, 2022 at 10:42 PM

Ikr! It blew my mind to see optimal control inspired designing of new optimizers! It shouldn't be surpising really but I can't not appreciate it. Also loveeeee the Kalman filter paper!!!!! And thanks for digging out that paper for me. Haven't gone through it fully yet, but it looks promising.

resented_ape t1_j26ei9b wrote on December 30, 2022 at 12:02 AM

The Quasi-Hyperbolic Momentum paper which attempts to generalize classical and Nesterov momentum discusses this in section 4.2. It references a blog post by Ben Recht.

Red-Portal t1_j26qhza wrote on December 30, 2022 at 1:28 AM

I don't see why one would have to go as far as a PID controller. The relationship between linear dynamical systems and momentum-based SGD algorithms is pretty straightforward. In fact, Lyapunov function-based analysis of SGD algorithms is pretty common.

bubudumbdumb t1_j28j898 wrote on December 30, 2022 at 12:28 PM

TIL : Nesterov momentum is an extension of momentum that involves calculating the decaying moving average of the gradients of projected positions in the search space rather than the actual positions themselves.

I had a course on control theory and the ingredients of Nesterov momentum seem to be common building blocks of linear control systems: moving average and decay. PID control is the industrial application of linear control theory.

Andthentherewere2 t1_j28sgje wrote on December 30, 2022 at 1:56 PM

RemindMe! 3 weeks

RemindMeBot t1_j2b2055 wrote on December 30, 2022 at 10:58 PM

I will be messaging you in 21 days on 2023-01-20 13:56:27 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)