Viewing a single comment thread. View all comments

chip_0 t1_j5naknl wrote on January 24, 2023 at 5:11 AM

Have you used RL with Human Feedback to fine-tune it yet?

I have an idea about how to use RLHF without expensive human annotation. Let me know if you would like to collaborate on that!