Submitted by JClub t3_10fh79i in MachineLearning
crazymonezyy t1_j4yjtuz wrote
Reply to comment by dataslacker in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub
Amongst other things, RLs major benefit is for learning from a sequence of reward over simply "a reward" which would be the assumption when you treat this is a SL problem. Do remember IID observations is one of the fundamental premises of SL.
Viewing a single comment thread. View all comments