Viewing a single comment thread. View all comments

VP4770 t1_j7186vz wrote on February 3, 2023 at 9:55 AM

Reply to comment by _Arsenie_Boca_ in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta

This