blazejd OP t1_ix8il0p wrote
Reply to comment by waa007 in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd
Can you rephrase the last part of your second sentence? Don't quite get what you mean.
koiRitwikHai t1_ix95poh wrote
It meant same as "how will you define an objective function?"
waa007 t1_ixayg3f wrote
In general RL, the environment will get a accurate reward after the agent have a step, In NLP, It's hard to give a accurate reward except that there is a really person to teach the agent.
So I think how to give a accurate reward is the main problem.
I'm sorry that it has so little contact with GAN.
Viewing a single comment thread. View all comments