Submitted by vidul7498 t3_11itl7g in MachineLearning
currentscurrents t1_jazwqft wrote
The reason you want to do RL is that there's problem scenarios where RL is the only way to learn the problem.
Unsupervised learning can teach a model to understand the world, and supervised learning can teach a model to complete a human-defined task. But reinforcement learning can teach a model to choose its own tasks to complete arbitrary goals.
Trouble is, the training signal in reinforcement learning is a lot smaller, so you need ridiculous amounts of training data. Current thinking is that you need to use unsupervised learning to learn a world model + RL to learn how to achieve goals inside that model. This combination has worked very well for things like DreamerV3.
thiru_2718 t1_jb6njez wrote
>supervised learning can teach a model to complete a human-defined task. But reinforcement learning can teach a model to choose its own tasks to complete arbitrary goals.
Isn't this contradicted by LLMs demonstrating emergent abilities (like learning how meta-learning strategies, or in-context learning) that allow it to tackle complex sequential tasks adaptively? There is research (i.e. https://innermonologue.github.io/) where LLMs are successfully applied to a traditional RL domain - planning and interaction for robots. While there is RLHF involved in models like ChatGPT, the bulk of the model's reasoning comes from the supervised learning.
As far as I can tell, the unexpected, emergent abilities of LLM have somewhat rewritten our assumptions of what is capable through supervised learning, and should be extended into the RL domain.
Viewing a single comment thread. View all comments