Nameless1995 t1_ivi33nf wrote on November 8, 2022 at 3:01 AM

Reply to comment by smallest_meta_review in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review

I am not sure. It's not my area of research. I learned of some of these ideas in a presentation made by someone years ago. Some of these recent paper essentially draws connection between distillation and label smoothing (essentially a way to provide "soft" labels -- this probably connects up with mixup techniques too). So on that ground, you can justify using any kind of teacher/student I think. Based on the label smoothing connection some paper goes for "teacher-free" distillation. And some others seem to be introducing "lightweight" teacher instead (I am not sure if the lightweight teacher is lower capacity than the student which would make it what you were looking for -- students having higher capacities. I haven't really read it beyond the abstract - just found it a few minutes ago from googling): https://arxiv.org/pdf/2005.09163.pdf (doesn't seem like a very popular paper though given it was published in arxiv in 2020 and have only 1 citation). Looks like a similar idea as to self-distillation was also available under the moniker of "born-again networks" (similar to also the reincarnation monker): https://arxiv.org/abs/1805.04770

smallest_meta_review OP t1_ivjle6n wrote on November 8, 2022 at 1:25 PM

Thanks for your informative reply. If interested, we have previously applied results from self-distillation to show that implicit regularization can actually lead to capacity loss in RL as bootstrapping can be viewed as self-distillation: https://drive.google.com/file/d/1vFs1FDS-h8HQ1J1rUKCgpbDlKTCZMap-/view?usp=drivesdk