Submitted by JClub t3_10emf7a in MachineLearning
Hey everyone, just saw the great presentation of Nathan Lambert on Reinforcement Learning from Human Feedback and wanted to try to do some RLHF on my language model.To do this, first I need to create an experience where I collect reward scores to train the reward model.
My question is: what rewards work best? Simply π/π? A scale of 1-5? Ranking 4 different model outputs? There are a lot of options and I don't know which one to choose.
[deleted] t1_j4sm206 wrote
[deleted]