Submitted by [deleted] t3_11d4ka5 in MachineLearning
gniorg t1_ja7sjkn wrote
Reply to comment by cthorrez in [D] Is RL dead/worth researching these days? by [deleted]
So basically, batch reinforcement learning / offline RL? The family of algorithms is useful for recommender systems, amongst others.
cthorrez t1_ja8d6oc wrote
Not exactly. In batch RL the data they train on are real (state, action, next state, reward) tuples from real agents interacting with real environments.
They improve the policy offline. In RLHF there actually is no env. And the policy is just standard LLM decoding.
Viewing a single comment thread. View all comments