schwagggg

schwagggg t1_j916cc5 wrote

Reply to comment by Oripy in [D] Simple Questions Thread by AutoModerator

so actor critic without critic is just policy gradient/reinforce/score function gradient, first two names used in RL, last one used in stats/OR.

short answer is policy gradient tends to have high variances empirically, so people use control variates to control its variance, and the critic is simply the control variate.

high variance methods usually converge to worse local minimas than low variance ones. u can verify this by taking or the critic function entirely. try it itself with that tutorial

1

schwagggg t1_j6x7eh7 wrote

i recently found a paper from Blei’s lab that use NF to learn klpq instead of klqp variational inferences (might be what the other commenter is referring to), but i’m afraid that’s not what u r interested in.

then apart from that the last SOTA i can remember was GLOW applied application wise.

2

schwagggg t1_iswqh92 wrote

cool stuff!

2 things:

  1. i am still trying to wrap my head around how to do this stuff: say we have an 2 layer NN with Bernoulli neurons, how do you take derivative wrt to the first layer’s weight in this case?

  2. seems to me that this approach needs many function evaluations, does it scare well wrt # stochastic variables? if i use it for a VAE with expensive decoder and say 1024 stochastic latents, would it be bad?

1