schwagggg t1_jckar2a wrote on March 17, 2023 at 1:32 PM

Reply to comment by bo_peng in [R] RWKV 14B ctx8192 is a zero-shot instruction-follower without finetuning, 23 token/s on 3090 after latest optimization (16G VRAM is enough, and you can stream layers to save more VRAM) by bo_peng

i thought it was

“r - dub - kay - vi”

which is a little long but unique

schwagggg t1_j916cc5 wrote on February 18, 2023 at 1:16 PM

Reply to comment by Oripy in [D] Simple Questions Thread by AutoModerator

so actor critic without critic is just policy gradient/reinforce/score function gradient, first two names used in RL, last one used in stats/OR.

short answer is policy gradient tends to have high variances empirically, so people use control variates to control its variance, and the critic is simply the control variate.

high variance methods usually converge to worse local minimas than low variance ones. u can verify this by taking or the critic function entirely. try it itself with that tutorial

schwagggg t1_j914xct wrote on February 18, 2023 at 1:02 PM

Reply to comment by aCuRiOuSguuy in [D] Simple Questions Thread by AutoModerator

can you share the syllabus and some of the early assignments?

schwagggg t1_j7avvo1 wrote on February 5, 2023 at 12:36 PM

Reply to comment by jimmymvp in [D] Normalizing Flows in 2023? by wellfriedbeans

hey thanks for the reference let me take a look.

schwagggg t1_j70gicd wrote on February 3, 2023 at 4:26 AM

Reply to comment by OptimizedGarbage in [D] Normalizing Flows in 2023? by wellfriedbeans

https://arxiv.org/abs/2202.01841

the score climbing part comes from https://proceedings.neurips.cc/paper/2020/hash/b20706935de35bbe643733f856d9e5d6-Abstract.html

schwagggg t1_j6x7eh7 wrote on February 2, 2023 at 3:13 PM

Reply to [D] Normalizing Flows in 2023? by wellfriedbeans

i recently found a paper from Blei’s lab that use NF to learn klpq instead of klqp variational inferences (might be what the other commenter is referring to), but i’m afraid that’s not what u r interested in.

then apart from that the last SOTA i can remember was GLOW applied application wise.

schwagggg t1_iw5vivc wrote on November 13, 2022 at 4:47 AM

Reply to comment by WigglyHypersurface in [D] When was the last time you wrote a custom neural net? by cautioushedonist

were you able to use the measure valued derivative for poisson? you posted a thread couple months ago

schwagggg t1_iubqpt2 wrote on October 30, 2022 at 2:45 AM

Reply to Qatar lavished British MPs with gifts ahead of World Cup by ladyem8

so y’all boycotted so that the politicians can get those gifts😂

schwagggg t1_iu4slj7 wrote on October 28, 2022 at 3:31 PM

Reply to [D] DL Practitioners, Do You Use Layer Visualization Tools s.a GradCam in Your Process? by DisWastingMyTime

no

mostly just common sense, tensorboard for grad history is good enough

schwagggg t1_isxz4cw wrote on October 19, 2022 at 2:58 PM

Reply to comment by ChrisRackauckas in [P] Stochastic Differentiable Programming: Unbiased Automatic Differentiation for Discrete Stochastic Programs (such as particle filters, agent-based models, and more!) by ChrisRackauckas

then this sounds like measure valued derivative a bit? you perturb then calculate derivative. then wouldn’t this be at least O(D) expensive for one layer, and O(LD) for L layers of D dim rvs?

schwagggg t1_iswqh92 wrote on October 19, 2022 at 7:08 AM

Reply to [P] Stochastic Differentiable Programming: Unbiased Automatic Differentiation for Discrete Stochastic Programs (such as particle filters, agent-based models, and more!) by ChrisRackauckas

cool stuff!

2 things:

i am still trying to wrap my head around how to do this stuff: say we have an 2 layer NN with Bernoulli neurons, how do you take derivative wrt to the first layer’s weight in this case?
seems to me that this approach needs many function evaluations, does it scare well wrt # stochastic variables? if i use it for a VAE with expensive decoder and say 1024 stochastic latents, would it be bad?