Viewing a single comment thread. View all comments

chip_0 t1_j5naknl wrote

Have you used RL with Human Feedback to fine-tune it yet?

I have an idea about how to use RLHF without expensive human annotation. Let me know if you would like to collaborate on that!

1