Viewing a single comment thread. View all comments

crazymonezyy t1_j4yjtuz wrote on January 19, 2023 at 3:44 AM

Reply to comment by dataslacker in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

Amongst other things, RLs major benefit is for learning from a sequence of reward over simply "a reward" which would be the assumption when you treat this is a SL problem. Do remember IID observations is one of the fundamental premises of SL.