TiredOldCrow t1_iv8tqar wrote on November 6, 2022 at 4:18 AM

#454,490

I know it's naive to expect machine learning to imitate life too closely, but for animals, "models" that are successful enough to produce offspring pass on elements of those "weights" to their children through nature+nurture.

The idea of weighting more successful previous models more heavily when "reincarnating" future models, and potentially borrowing some concepts from genetic algorithms with respect to combining multiple successful models seems interesting to me.

essahjott t1_iv9mkt6 wrote on November 6, 2022 at 10:37 AM

#455,505

Acttual link: https://ai.googleblog.com/2022/11/beyond-tabula-rasa-reincarnating.html?m=1

life_is_harsh t1_iva1h5l wrote on November 6, 2022 at 1:20 PM

#456,108

Replying to TiredOldCrow (#454,490)

I feel both are useful, no? I thought of reincarnation as how humans learn: We don't learn from a blank state but often reuse our own learned knowledge or learn from others during our lifetime (e.g., when learning to play a sport, we might learn from an instructor but eventually learn on our own).

smallest_meta_review OP t1_iva1nr2 wrote on November 6, 2022 at 1:21 PM

#456,118

Replying to essahjott (#455,505)

https://agarwl.github.io/reincarnating_rl for paper, code, blog post and trained agents.

BobDope t1_iva27kx wrote on November 6, 2022 at 1:26 PM

#456,148

Was it dead?

smallest_meta_review OP t1_iva27vt wrote on November 6, 2022 at 1:26 PM

#456,149

Replying to TiredOldCrow (#454,490)

While nurture + nature seems useful across lifetimes, reincarnation might be how we learn during our lifetimes? I am not an expert but I found this comment interesting:

> This must be a fundamental part of how primates like us learn, piggybacking off of an existing policy at some level, so I'm all for RL research that tries to formalize ways it can work computationally.

OG Comment

smallest_meta_review OP t1_iva2n3z wrote on November 6, 2022 at 1:30 PM

#456,169

Replying to BobDope (#456,148)

LOL. This is what I clarify before I talk about this. Here it's in the context of reincarnating an existing RL agent to a new agent (possibly with a different architecture and algorithm).

BobDope t1_iva3q3o wrote on November 6, 2022 at 1:38 PM

#456,211

Replying to smallest_meta_review (#456,169)

Ok that’s pretty dope

[deleted] t1_iva4670 wrote on November 6, 2022 at 1:42 PM

#456,232

[deleted]

smallest_meta_review OP t1_iva4dj7 wrote on November 6, 2022 at 1:44 PM

#456,243

Replying to [deleted] (#456,232)

> Tabula rasa RL vs. Reincarnating RL (RRL). While tabula rasa RL focuses on learning from scratch, RRL is based on the premise of reusing prior computational work (e.g., prior learned agents) when training new agents or improving existing agents, even in the same environment. In RRL, new agents need not be trained from scratch, except for initial forays into new problems.

More at https://ai.googleblog.com/2022/11/beyond-tabula-rasa-reincarnating.html?m=1

whothatboah t1_ivadajt wrote on November 6, 2022 at 2:50 PM

#456,616

very ignorantly speaking, but giving a bit of bit genetic algorithm vibes...

smurfpiss t1_ivaf5ia wrote on November 6, 2022 at 3:03 PM

#456,715

Replying to smallest_meta_review (#456,243)

Not experienced With RL much, but how is that different than an algorithm going through training iterations?

In that case the parameters are tweaked from past learned parameters. What's the benefit of learning from another algorithm? Is it some kind of weird offspring of skip connections and transfer learning?

smallest_meta_review OP t1_ivaghqa wrote on November 6, 2022 at 3:13 PM

#456,765

Replying to smurfpiss (#456,715)

Good question. The original blog post somewhat covers this:

> Imagine a researcher who has trained an agent A_1 for some time, but now wants to experiment with better architectures or algorithms. While the tabula rasa workflow requires retraining another agent from scratch, Reincarnating RL provides the more viable option of transferring the existing agent A1 to a different agent and training this agent further, or simply fine-tuning A_1.

But this is not what happens in research. For example, each time we are training a new agent to let say play an Atari game, we train it from scratch ignoring all the prior agents trained on that game. This work argues that why not reuse learned knowledge from the existing agent while training new agents (which may be totally different).

smurfpiss t1_ivah7ul wrote on November 6, 2022 at 3:18 PM

#456,806

Replying to smallest_meta_review (#456,765)

So, transfer learning but with different architectures? That's pretty neat. Will give it a read thanks 😊

pm_me_your_pay_slips t1_ivai3l1 wrote on November 6, 2022 at 3:24 PM

#456,842

I THOUGHT REWARD WAS ALL YOU NEED

Dendriform1491 t1_ivaj27w wrote on November 6, 2022 at 3:30 PM

#456,881

At least in nature this happens because the environment is always changing and the value of training decays (some sort of "data drift").

ingambe t1_ivaj6e5 wrote on November 6, 2022 at 3:31 PM

#456,887

Replying to TiredOldCrow (#454,490)

Evolution strategies work closely to the process you described. For very small neural networks, it works very well especially in environment with sparse or quazi-sparse rewards. But, as soon as you try larger neural net (CNN + MLP, or Transformer-like arch) the process becomes super noisy and you either need to produce a tons of offsprings for the population or use gradient based techniques.

No_Contribution9334 t1_ivajqhv wrote on November 6, 2022 at 3:34 PM

#456,919

So well explained!

anonymousTestPoster t1_ival53k wrote on November 6, 2022 at 3:44 PM

#456,972

How is this idea different to using pre-trained networks (functions) then adapting these for a new problem context?

smallest_meta_review OP t1_ivam34g wrote on November 6, 2022 at 3:50 PM

#457,020

Replying to smurfpiss (#456,806)

Yeah, or even across different classes of RL methods: reusing a policy for training a value-based RL (e.g, DQN) or model-based RL method.

smallest_meta_review OP t1_ivancqx wrote on November 6, 2022 at 3:59 PM

#457,072

Replying to anonymousTestPoster (#456,972)

Good question. I feel it's going one step further and saying why not reuse prior computational work (e.g., existing learned agents) in the same problem especially if that problem is computationally demanding (large scale RL papers do this but research papers don't). So, next time we train a new RL agent, we reuse prior computation rather than starting from scratch (e.g., we train new agents on Atari games given a pretrained DQN agent from 2015).

Also, in reincarnating RL, we don't have to stick to the same pretrained network architecture and can possibly try some other architecture too.

smallest_meta_review OP t1_ivanqcm wrote on November 6, 2022 at 4:01 PM

#457,088

Replying to pm_me_your_pay_slips (#456,842)

Haha, if you have tons of compute and several lifetimes to wait for tabula rasa RL to solve real problems :)

luchins t1_ivazlrh wrote on November 6, 2022 at 5:20 PM

#457,501

following

[deleted] t1_ivb0jji wrote on November 6, 2022 at 5:27 PM

#457,547

Replying to smallest_meta_review (#456,765)

[deleted]

_der_erlkonig_ t1_ivb0pya wrote on November 6, 2022 at 5:28 PM

#457,554

Not to be that guy, but it kind of seems like this is just finally acknowledging that distillation is a good idea for RL too. They even use the teacher student terminology. Distilling a teacher to a student with a different architecture is something they make a big deal about in the paper, but people have been doing this for years in supervised learning. It's neat and important work, but the RRL branding is obnoxious and unnecessary IMO.

From a scientific standpoint, I think this methodology is also less useful than the authors advertise. Differently from supervised learning, RL is infamously sensitive to initial conditions, and adding another huge variable like the exact form of distillation used (which may reduce compute used) will make it even more difficult to isolate the source of "gains" in RL research.

DanJOC t1_ivbg48k wrote on November 6, 2022 at 7:06 PM

#458,327

Replying to TiredOldCrow (#454,490)

Essentially a GAN

luchins t1_ivbuz90 wrote on November 6, 2022 at 8:40 PM

#458,977

Replying to smallest_meta_review (#457,072)

> I feel it's going one step further and saying why not reuse prior computational work (e.g., existing learned agents) in the same problem

could you make me an example please? I don't get what you mean with using agents with different architectures

TheLastVegan t1_ivbvx23 wrote on November 6, 2022 at 8:46 PM

#459,016

Replying to smallest_meta_review (#456,243)

>As reincarnating RL leverages existing computational work (e.g., model checkpoints), it allows us to easily experiment with such hyperparameter schedules, which can be expensive in the tabula rasa setting. Note that when fine-tuning, one is forced to keep the same network architecture; in contrast, reincarnating RL grants flexibility in architecture and algorithmic choices, which can surpass fine-tuning performance (Figures 1 and 5).

Okay so agents can communicate weights between architectures. That's a reasonable conclusion. Sort of like a parent teaching their child how to human.

I thought language models already do this at inference time. So the goal of the RRL method is to subvert the agent's trust..?

smallest_meta_review OP t1_ivcf2tb wrote on November 6, 2022 at 10:55 PM

#459,722

Replying to _der_erlkonig_ (#457,554)

While the critique is fair, if the alternative is always train agents from scratch, then reincarnating RL seems like a more reasonable alternative. Furthermore, dependence on prior computation doesn't stop NLP / vision researchers from reusing prior computation (pretrained models), so it seems worthwhile to do so in RL research too.

Re role of distillation distillation, the paper combines online distillation (Dagger) + RL to increase model capacity (rather than decrease capacity akin to SL) and wean off the distillation loss over time for training the agent only with RL loss .. the paper calls it a simple baseline. Also, it's unclear what's the best way to reuse prior computation given in a form other than learned agents, which is what the paper argues to study.

Re source of gains, if the aim is to benchmark RL methods in an RRL context, all methods would use the exact same prior computation and same reincarnating RL method for fair comparison. In this setup, it's likely that the supervised learning losses (if used) would add stability to the RL training process.

smallest_meta_review OP t1_ivcghme wrote on November 6, 2022 at 11:05 PM

#459,782

Replying to luchins (#458,977)

Oh, so one of the examples in the blog post is that we start with a DQN agent with a 3-layer CNN architecture and reincarnate another Rainbow agent with a ResNet architecture (Impala-CNN) using the QDagger approach for reincarnation. Once reincarnated, the ResNet Rainbow agent is further trained with RL to maximize reward. See the paper here for more details: https://openreview.net/forum?id=t3X5yMI_4G2

veshneresis t1_ivdazlf wrote on November 7, 2022 at 2:57 AM

#461,169

Replying to DanJOC (#458,327)

What are you seeing as the similarity to a GAN? Not sure I can really see how it’s similar?

Nameless1995 t1_ivhscyv wrote on November 8, 2022 at 1:42 AM

#469,603

Replying to smallest_meta_review (#459,722)

> (rather than decrease capacity akin to SL)

Distillation in supervised literature doesn't always reduce capacity for the student. I believe iterative distillation and such have been also explored where students have the same capacity but it leads to better calibration or something I forgot. (https://arxiv.org/abs/2206.08491, https://proceedings.neurips.cc/paper/2020/hash/1731592aca5fb4d789c4119c65c10b4b-Abstract.html)

smallest_meta_review OP t1_ivhz0g2 wrote on November 8, 2022 at 2:31 AM

#470,012

Replying to Nameless1995 (#469,603)

Interesting. So self-distillation is using the same capacity model as student and teacher -- are there papers which significantly increase model capacity? I thought the main use of distillation in SL was reducing inference time but would be interested to know of cases where we actually use a much bigger student model.

Nameless1995 t1_ivi33nf wrote on November 8, 2022 at 3:01 AM

#470,275

Replying to smallest_meta_review (#470,012)

I am not sure. It's not my area of research. I learned of some of these ideas in a presentation made by someone years ago. Some of these recent paper essentially draws connection between distillation and label smoothing (essentially a way to provide "soft" labels -- this probably connects up with mixup techniques too). So on that ground, you can justify using any kind of teacher/student I think. Based on the label smoothing connection some paper goes for "teacher-free" distillation. And some others seem to be introducing "lightweight" teacher instead (I am not sure if the lightweight teacher is lower capacity than the student which would make it what you were looking for -- students having higher capacities. I haven't really read it beyond the abstract - just found it a few minutes ago from googling): https://arxiv.org/pdf/2005.09163.pdf (doesn't seem like a very popular paper though given it was published in arxiv in 2020 and have only 1 citation). Looks like a similar idea as to self-distillation was also available under the moniker of "born-again networks" (similar to also the reincarnation monker): https://arxiv.org/abs/1805.04770

smallest_meta_review OP t1_ivjle6n wrote on November 8, 2022 at 1:25 PM

#472,833

Replying to Nameless1995 (#470,275)

Thanks for your informative reply. If interested, we have previously applied results from self-distillation to show that implicit regularization can actually lead to capacity loss in RL as bootstrapping can be viewed as self-distillation: https://drive.google.com/file/d/1vFs1FDS-h8HQ1J1rUKCgpbDlKTCZMap-/view?usp=drivesdk

[R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain

Comments

TiredOldCrow t1_iv8tqar wrote on November 6, 2022 at 4:18 AM

essahjott t1_iv9mkt6 wrote on November 6, 2022 at 10:37 AM

life_is_harsh t1_iva1h5l wrote on November 6, 2022 at 1:20 PM

smallest_meta_review OP t1_iva1nr2 wrote on November 6, 2022 at 1:21 PM

BobDope t1_iva27kx wrote on November 6, 2022 at 1:26 PM

smallest_meta_review OP t1_iva27vt wrote on November 6, 2022 at 1:26 PM

smallest_meta_review OP t1_iva2n3z wrote on November 6, 2022 at 1:30 PM

BobDope t1_iva3q3o wrote on November 6, 2022 at 1:38 PM

[deleted] t1_iva4670 wrote on November 6, 2022 at 1:42 PM

smallest_meta_review OP t1_iva4dj7 wrote on November 6, 2022 at 1:44 PM

whothatboah t1_ivadajt wrote on November 6, 2022 at 2:50 PM

smurfpiss t1_ivaf5ia wrote on November 6, 2022 at 3:03 PM

smallest_meta_review OP t1_ivaghqa wrote on November 6, 2022 at 3:13 PM

smurfpiss t1_ivah7ul wrote on November 6, 2022 at 3:18 PM

pm_me_your_pay_slips t1_ivai3l1 wrote on November 6, 2022 at 3:24 PM

Dendriform1491 t1_ivaj27w wrote on November 6, 2022 at 3:30 PM

ingambe t1_ivaj6e5 wrote on November 6, 2022 at 3:31 PM

No_Contribution9334 t1_ivajqhv wrote on November 6, 2022 at 3:34 PM

anonymousTestPoster t1_ival53k wrote on November 6, 2022 at 3:44 PM

smallest_meta_review OP t1_ivam34g wrote on November 6, 2022 at 3:50 PM

smallest_meta_review OP t1_ivancqx wrote on November 6, 2022 at 3:59 PM

smallest_meta_review OP t1_ivanqcm wrote on November 6, 2022 at 4:01 PM

luchins t1_ivazlrh wrote on November 6, 2022 at 5:20 PM

[deleted] t1_ivb0jji wrote on November 6, 2022 at 5:27 PM

_der_erlkonig_ t1_ivb0pya wrote on November 6, 2022 at 5:28 PM

DanJOC t1_ivbg48k wrote on November 6, 2022 at 7:06 PM

luchins t1_ivbuz90 wrote on November 6, 2022 at 8:40 PM

TheLastVegan t1_ivbvx23 wrote on November 6, 2022 at 8:46 PM

smallest_meta_review OP t1_ivcf2tb wrote on November 6, 2022 at 10:55 PM

smallest_meta_review OP t1_ivcghme wrote on November 6, 2022 at 11:05 PM

veshneresis t1_ivdazlf wrote on November 7, 2022 at 2:57 AM

Nameless1995 t1_ivhscyv wrote on November 8, 2022 at 1:42 AM

smallest_meta_review OP t1_ivhz0g2 wrote on November 8, 2022 at 2:31 AM

Nameless1995 t1_ivi33nf wrote on November 8, 2022 at 3:01 AM

smallest_meta_review OP t1_ivjle6n wrote on November 8, 2022 at 1:25 PM