schwagggg
schwagggg t1_j916cc5 wrote
Reply to comment by Oripy in [D] Simple Questions Thread by AutoModerator
so actor critic without critic is just policy gradient/reinforce/score function gradient, first two names used in RL, last one used in stats/OR.
short answer is policy gradient tends to have high variances empirically, so people use control variates to control its variance, and the critic is simply the control variate.
high variance methods usually converge to worse local minimas than low variance ones. u can verify this by taking or the critic function entirely. try it itself with that tutorial
schwagggg t1_j914xct wrote
Reply to comment by aCuRiOuSguuy in [D] Simple Questions Thread by AutoModerator
can you share the syllabus and some of the early assignments?
schwagggg t1_j7avvo1 wrote
Reply to comment by jimmymvp in [D] Normalizing Flows in 2023? by wellfriedbeans
hey thanks for the reference let me take a look.
schwagggg t1_j70gicd wrote
Reply to comment by OptimizedGarbage in [D] Normalizing Flows in 2023? by wellfriedbeans
https://arxiv.org/abs/2202.01841
the score climbing part comes from https://proceedings.neurips.cc/paper/2020/hash/b20706935de35bbe643733f856d9e5d6-Abstract.html
schwagggg t1_j6x7eh7 wrote
Reply to [D] Normalizing Flows in 2023? by wellfriedbeans
i recently found a paper from Blei’s lab that use NF to learn klpq instead of klqp variational inferences (might be what the other commenter is referring to), but i’m afraid that’s not what u r interested in.
then apart from that the last SOTA i can remember was GLOW applied application wise.
schwagggg t1_iw5vivc wrote
Reply to comment by WigglyHypersurface in [D] When was the last time you wrote a custom neural net? by cautioushedonist
were you able to use the measure valued derivative for poisson? you posted a thread couple months ago
schwagggg t1_iubqpt2 wrote
so y’all boycotted so that the politicians can get those gifts😂
schwagggg t1_iu4slj7 wrote
Reply to [D] DL Practitioners, Do You Use Layer Visualization Tools s.a GradCam in Your Process? by DisWastingMyTime
no
mostly just common sense, tensorboard for grad history is good enough
schwagggg t1_isxz4cw wrote
Reply to comment by ChrisRackauckas in [P] Stochastic Differentiable Programming: Unbiased Automatic Differentiation for Discrete Stochastic Programs (such as particle filters, agent-based models, and more!) by ChrisRackauckas
then this sounds like measure valued derivative a bit? you perturb then calculate derivative. then wouldn’t this be at least O(D) expensive for one layer, and O(LD) for L layers of D dim rvs?
schwagggg t1_iswqh92 wrote
Reply to [P] Stochastic Differentiable Programming: Unbiased Automatic Differentiation for Discrete Stochastic Programs (such as particle filters, agent-based models, and more!) by ChrisRackauckas
cool stuff!
2 things:
-
i am still trying to wrap my head around how to do this stuff: say we have an 2 layer NN with Bernoulli neurons, how do you take derivative wrt to the first layer’s weight in this case?
-
seems to me that this approach needs many function evaluations, does it scare well wrt # stochastic variables? if i use it for a VAE with expensive decoder and say 1024 stochastic latents, would it be bad?
schwagggg t1_jckar2a wrote
Reply to comment by bo_peng in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng
i thought it was
“r - dub - kay - vi”
which is a little long but unique