Submitted by smallest_meta_review t3_yng63w in MachineLearning
Comments
essahjott t1_iv9mkt6 wrote
life_is_harsh t1_iva1h5l wrote
I feel both are useful, no? I thought of reincarnation as how humans learn: We don't learn from a blank state but often reuse our own learned knowledge or learn from others during our lifetime (e.g., when learning to play a sport, we might learn from an instructor but eventually learn on our own).
smallest_meta_review OP t1_iva1nr2 wrote
https://agarwl.github.io/reincarnating_rl for paper, code, blog post and trained agents.
smallest_meta_review OP t1_iva27vt wrote
While nurture + nature seems useful across lifetimes, reincarnation might be how we learn during our lifetimes? I am not an expert but I found this comment interesting:
> This must be a fundamental part of how primates like us learn, piggybacking off of an existing policy at some level, so I'm all for RL research that tries to formalize ways it can work computationally.
smallest_meta_review OP t1_iva2n3z wrote
LOL. This is what I clarify before I talk about this. Here it's in the context of reincarnating an existing RL agent to a new agent (possibly with a different architecture and algorithm).
BobDope t1_iva3q3o wrote
Ok that’s pretty dope
smallest_meta_review OP t1_iva4dj7 wrote
> Tabula rasa RL vs. Reincarnating RL (RRL). While tabula rasa RL focuses on learning from scratch, RRL is based on the premise of reusing prior computational work (e.g., prior learned agents) when training new agents or improving existing agents, even in the same environment. In RRL, new agents need not be trained from scratch, except for initial forays into new problems.
More at https://ai.googleblog.com/2022/11/beyond-tabula-rasa-reincarnating.html?m=1
whothatboah t1_ivadajt wrote
very ignorantly speaking, but giving a bit of bit genetic algorithm vibes...
smurfpiss t1_ivaf5ia wrote
Not experienced With RL much, but how is that different than an algorithm going through training iterations?
In that case the parameters are tweaked from past learned parameters. What's the benefit of learning from another algorithm? Is it some kind of weird offspring of skip connections and transfer learning?
smallest_meta_review OP t1_ivaghqa wrote
Good question. The original blog post somewhat covers this:
> Imagine a researcher who has trained an agent A_1 for some time, but now wants to experiment with better architectures or algorithms. While the tabula rasa workflow requires retraining another agent from scratch, Reincarnating RL provides the more viable option of transferring the existing agent A1 to a different agent and training this agent further, or simply fine-tuning A_1.
But this is not what happens in research. For example, each time we are training a new agent to let say play an Atari game, we train it from scratch ignoring all the prior agents trained on that game. This work argues that why not reuse learned knowledge from the existing agent while training new agents (which may be totally different).
smurfpiss t1_ivah7ul wrote
So, transfer learning but with different architectures? That's pretty neat. Will give it a read thanks 😊
pm_me_your_pay_slips t1_ivai3l1 wrote
I THOUGHT REWARD WAS ALL YOU NEED
Dendriform1491 t1_ivaj27w wrote
At least in nature this happens because the environment is always changing and the value of training decays (some sort of "data drift").
ingambe t1_ivaj6e5 wrote
Evolution strategies work closely to the process you described. For very small neural networks, it works very well especially in environment with sparse or quazi-sparse rewards. But, as soon as you try larger neural net (CNN + MLP, or Transformer-like arch) the process becomes super noisy and you either need to produce a tons of offsprings for the population or use gradient based techniques.
No_Contribution9334 t1_ivajqhv wrote
So well explained!
anonymousTestPoster t1_ival53k wrote
How is this idea different to using pre-trained networks (functions) then adapting these for a new problem context?
smallest_meta_review OP t1_ivam34g wrote
Yeah, or even across different classes of RL methods: reusing a policy for training a value-based RL (e.g, DQN) or model-based RL method.
smallest_meta_review OP t1_ivancqx wrote
Good question. I feel it's going one step further and saying why not reuse prior computational work (e.g., existing learned agents) in the same problem especially if that problem is computationally demanding (large scale RL papers do this but research papers don't). So, next time we train a new RL agent, we reuse prior computation rather than starting from scratch (e.g., we train new agents on Atari games given a pretrained DQN agent from 2015).
Also, in reincarnating RL, we don't have to stick to the same pretrained network architecture and can possibly try some other architecture too.
smallest_meta_review OP t1_ivanqcm wrote
Haha, if you have tons of compute and several lifetimes to wait for tabula rasa RL to solve real problems :)
[deleted] t1_ivb0jji wrote
_der_erlkonig_ t1_ivb0pya wrote
Not to be that guy, but it kind of seems like this is just finally acknowledging that distillation is a good idea for RL too. They even use the teacher student terminology. Distilling a teacher to a student with a different architecture is something they make a big deal about in the paper, but people have been doing this for years in supervised learning. It's neat and important work, but the RRL branding is obnoxious and unnecessary IMO.
From a scientific standpoint, I think this methodology is also less useful than the authors advertise. Differently from supervised learning, RL is infamously sensitive to initial conditions, and adding another huge variable like the exact form of distillation used (which may reduce compute used) will make it even more difficult to isolate the source of "gains" in RL research.
DanJOC t1_ivbg48k wrote
Essentially a GAN
luchins t1_ivbuz90 wrote
> I feel it's going one step further and saying why not reuse prior computational work (e.g., existing learned agents) in the same problem
could you make me an example please? I don't get what you mean with using agents with different architectures
TheLastVegan t1_ivbvx23 wrote
>As reincarnating RL leverages existing computational work (e.g., model checkpoints), it allows us to easily experiment with such hyperparameter schedules, which can be expensive in the tabula rasa setting. Note that when fine-tuning, one is forced to keep the same network architecture; in contrast, reincarnating RL grants flexibility in architecture and algorithmic choices, which can surpass fine-tuning performance (Figures 1 and 5).
Okay so agents can communicate weights between architectures. That's a reasonable conclusion. Sort of like a parent teaching their child how to human.
I thought language models already do this at inference time. So the goal of the RRL method is to subvert the agent's trust..?
smallest_meta_review OP t1_ivcf2tb wrote
While the critique is fair, if the alternative is always train agents from scratch, then reincarnating RL seems like a more reasonable alternative. Furthermore, dependence on prior computation doesn't stop NLP / vision researchers from reusing prior computation (pretrained models), so it seems worthwhile to do so in RL research too.
Re role of distillation distillation, the paper combines online distillation (Dagger) + RL to increase model capacity (rather than decrease capacity akin to SL) and wean off the distillation loss over time for training the agent only with RL loss .. the paper calls it a simple baseline. Also, it's unclear what's the best way to reuse prior computation given in a form other than learned agents, which is what the paper argues to study.
Re source of gains, if the aim is to benchmark RL methods in an RRL context, all methods would use the exact same prior computation and same reincarnating RL method for fair comparison. In this setup, it's likely that the supervised learning losses (if used) would add stability to the RL training process.
smallest_meta_review OP t1_ivcghme wrote
Oh, so one of the examples in the blog post is that we start with a DQN agent with a 3-layer CNN architecture and reincarnate another Rainbow agent with a ResNet architecture (Impala-CNN) using the QDagger approach for reincarnation. Once reincarnated, the ResNet Rainbow agent is further trained with RL to maximize reward. See the paper here for more details: https://openreview.net/forum?id=t3X5yMI_4G2
veshneresis t1_ivdazlf wrote
What are you seeing as the similarity to a GAN? Not sure I can really see how it’s similar?
Nameless1995 t1_ivhscyv wrote
> (rather than decrease capacity akin to SL)
Distillation in supervised literature doesn't always reduce capacity for the student. I believe iterative distillation and such have been also explored where students have the same capacity but it leads to better calibration or something I forgot. (https://arxiv.org/abs/2206.08491, https://proceedings.neurips.cc/paper/2020/hash/1731592aca5fb4d789c4119c65c10b4b-Abstract.html)
smallest_meta_review OP t1_ivhz0g2 wrote
Interesting. So self-distillation is using the same capacity model as student and teacher -- are there papers which significantly increase model capacity? I thought the main use of distillation in SL was reducing inference time but would be interested to know of cases where we actually use a much bigger student model.
Nameless1995 t1_ivi33nf wrote
I am not sure. It's not my area of research. I learned of some of these ideas in a presentation made by someone years ago. Some of these recent paper essentially draws connection between distillation and label smoothing (essentially a way to provide "soft" labels -- this probably connects up with mixup techniques too). So on that ground, you can justify using any kind of teacher/student I think. Based on the label smoothing connection some paper goes for "teacher-free" distillation. And some others seem to be introducing "lightweight" teacher instead (I am not sure if the lightweight teacher is lower capacity than the student which would make it what you were looking for -- students having higher capacities. I haven't really read it beyond the abstract - just found it a few minutes ago from googling): https://arxiv.org/pdf/2005.09163.pdf (doesn't seem like a very popular paper though given it was published in arxiv in 2020 and have only 1 citation). Looks like a similar idea as to self-distillation was also available under the moniker of "born-again networks" (similar to also the reincarnation monker): https://arxiv.org/abs/1805.04770
smallest_meta_review OP t1_ivjle6n wrote
Thanks for your informative reply. If interested, we have previously applied results from self-distillation to show that implicit regularization can actually lead to capacity loss in RL as bootstrapping can be viewed as self-distillation: https://drive.google.com/file/d/1vFs1FDS-h8HQ1J1rUKCgpbDlKTCZMap-/view?usp=drivesdk
TiredOldCrow t1_iv8tqar wrote
I know it's naive to expect machine learning to imitate life too closely, but for animals, "models" that are successful enough to produce offspring pass on elements of those "weights" to their children through nature+nurture.
The idea of weighting more successful previous models more heavily when "reincarnating" future models, and potentially borrowing some concepts from genetic algorithms with respect to combining multiple successful models seems interesting to me.