_der_erlkonig_

_der_erlkonig_ t1_ivb0pya wrote

Not to be that guy, but it kind of seems like this is just finally acknowledging that distillation is a good idea for RL too. They even use the teacher student terminology. Distilling a teacher to a student with a different architecture is something they make a big deal about in the paper, but people have been doing this for years in supervised learning. It's neat and important work, but the RRL branding is obnoxious and unnecessary IMO.

From a scientific standpoint, I think this methodology is also less useful than the authors advertise. Differently from supervised learning, RL is infamously sensitive to initial conditions, and adding another huge variable like the exact form of distillation used (which may reduce compute used) will make it even more difficult to isolate the source of "gains" in RL research.

9