Submitted by JClub t3_10emf7a in MachineLearning
JClub OP t1_j4v5d0y wrote
Reply to comment by koolaidman123 in [D] RLHF - What type of rewards to use? by JClub
Ah right, then you can just use the model's reward directly or pass it through a sigmoid so that the reward is between 0-1!
Do you think that the sigmoid is needed?
Viewing a single comment thread. View all comments