mtocrat t1_j4zecpm wrote on January 19, 2023 at 9:10 AM

Reply to comment by dataslacker in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

What you're describing is a general approach to RL that is used in different forms in many methods: sample actions, weight or rank them in some way by the estimated return, regress to the weighted actions. So you're not suggesting to do something other than RL but to replace one RL approach with a different RL approach.