Cheap_Meeting t1_ix4l30k wrote
Reply to comment by AlexGrinch in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd
I don't think this is a good answer. Modeling the probability distribution of language is not a worthwhile goal by itself. Which is why language modeling was a niche topic for a very long time. The reason that there has been so much interest in large language models in the last couple of years is that they do "learn" language.
AlexGrinch t1_ix4tgps wrote
I would like to disagree with you. LM was a niche topic because we did not have necessary tools to build really complex models able to catch at least a fraction of complexity and richness of the natural language. Starting from Shannon’s experiments with simple N-gram LMs, researchers returned to language modeling again and again. Finally they got the tools to catch the underlying distribution (which is insanely complex and multimodal) really well.
If you manage to perfectly model the distribution of, for example, comments for threads in ML subreddit, you can easily run it to debate with me. And I will not be able to tell the difference.
Cheap_Meeting t1_ix520oo wrote
Rereading my own comment, it could have been phrased better. Let me try again:
I think you are taking OP's question too literally. At least as I understand it, the intent of OP's question was: "Why are self-supervised autoregressive models the predominant form of generative models for language? Intuitively it would seem that the training process should be closer to how humans learn language."
idrajitsc t1_ix56883 wrote
I think that's still answered pretty well by their original comment: probability distributions over sequences of words, given sufficient compute and good enough corpora, gets pretty close to the superficial aspects of language. And we can now learn those well with LLMs, so why insist on RL instead?
For actually learning language, in the sense of using it to convey meaningful, appropriate information, which LLMs so far cannot do, maybe it's better to take an RL approach. But I don't know how to write a reward function that encompasses that. So as long as we can't do the superior thing with either approach, we might as well focus on the easier approach to the superficial thing.
blazejd OP t1_ix7jekd wrote
I think u/Cheap_Meeting understood my question a bit better here. The end goal is to create an NLP model that learns to can understand and communicate in natural language. This is why currently the main NLP benchmarks cover many different tasks etc. We use language models because that's an easier approach, but not necessarily better.
idrajitsc t1_ix8cpi8 wrote
Sure, but the answer remains: what reward function do you use that encompasses understanding and communicating, on top of grammar? Conceptually the RL approach might be better, but that doesn't mean it's at all doable.
blazejd OP t1_ix8ibmh wrote
>For actually learning language, in the sense of using it to convey meaningful, appropriate information, which LLMs so far cannot do, maybe it's better to take an RL approach. But I don't know how to write a reward function that encompasses that. So as long as we can't do the superior thing with either approach, we might as well focus on the easier approach to the superficial thing.
My understanding of this paragraph simply put is (correct me if I'm wrong) "RL might be better, but we don't know how to do it, so let's not try. Language models are doing fine.".
In my opinion, in science we should focus simultaneously on easier problems that can lead to shorter-term gains (language models) AND ALSO more difficult problems that are riskier but might be better long term (RL-based).
idrajitsc t1_ix8lrk0 wrote
I mean, I'm not really sure what your ask is. People do work on RL for NLP. It just doesn't offer any huge advantage, and the reason your intuition doesn't translate to an actual advantage is because writing a reward function that reproduces the human feedback a baby receives is essentially impossible. And not just in a, it's hard but if we put enough work into it we can figure it out, kind of way.
blazejd OP t1_ix8k4wf wrote
>Sure, but the answer remains: what reward function do you use that encompasses understanding and communicating, on top of grammar?
I realize this doesn't directly answer your question, so might point is that we don't know the answer, but we should at least try to pursue it.
Viewing a single comment thread. View all comments