waa007 t1_ixayg3f wrote on November 22, 2022 at 2:07 AM

Reply to comment by blazejd in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

In general RL, the environment will get a accurate reward after the agent have a step, In NLP, It's hard to give a accurate reward except that there is a really person to teach the agent.

So I think how to give a accurate reward is the main problem.

I'm sorry that it has so little contact with GAN.