idrajitsc t1_ix8lrk0 wrote on November 21, 2022 at 4:14 PM

Reply to comment by blazejd in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

I mean, I'm not really sure what your ask is. People do work on RL for NLP. It just doesn't offer any huge advantage, and the reason your intuition doesn't translate to an actual advantage is because writing a reward function that reproduces the human feedback a baby receives is essentially impossible. And not just in a, it's hard but if we put enough work into it we can figure it out, kind of way.