Viewing a single comment thread. View all comments

currentscurrents t1_jazwqft wrote

The reason you want to do RL is that there's problem scenarios where RL is the only way to learn the problem.

Unsupervised learning can teach a model to understand the world, and supervised learning can teach a model to complete a human-defined task. But reinforcement learning can teach a model to choose its own tasks to complete arbitrary goals.

Trouble is, the training signal in reinforcement learning is a lot smaller, so you need ridiculous amounts of training data. Current thinking is that you need to use unsupervised learning to learn a world model + RL to learn how to achieve goals inside that model. This combination has worked very well for things like DreamerV3.

18

thiru_2718 t1_jb6njez wrote

>supervised learning can teach a model to complete a human-defined task. But reinforcement learning can teach a model to choose its own tasks to complete arbitrary goals.

Isn't this contradicted by LLMs demonstrating emergent abilities (like learning how meta-learning strategies, or in-context learning) that allow it to tackle complex sequential tasks adaptively? There is research (i.e. https://innermonologue.github.io/) where LLMs are successfully applied to a traditional RL domain - planning and interaction for robots. While there is RLHF involved in models like ChatGPT, the bulk of the model's reasoning comes from the supervised learning.

As far as I can tell, the unexpected, emergent abilities of LLM have somewhat rewritten our assumptions of what is capable through supervised learning, and should be extended into the RL domain.

−1