Viewing a single comment thread. View all comments

idrajitsc t1_ix56883 wrote

I think that's still answered pretty well by their original comment: probability distributions over sequences of words, given sufficient compute and good enough corpora, gets pretty close to the superficial aspects of language. And we can now learn those well with LLMs, so why insist on RL instead?

For actually learning language, in the sense of using it to convey meaningful, appropriate information, which LLMs so far cannot do, maybe it's better to take an RL approach. But I don't know how to write a reward function that encompasses that. So as long as we can't do the superior thing with either approach, we might as well focus on the easier approach to the superficial thing.

2

blazejd OP t1_ix7jekd wrote

I think u/Cheap_Meeting understood my question a bit better here. The end goal is to create an NLP model that learns to can understand and communicate in natural language. This is why currently the main NLP benchmarks cover many different tasks etc. We use language models because that's an easier approach, but not necessarily better.

1

idrajitsc t1_ix8cpi8 wrote

Sure, but the answer remains: what reward function do you use that encompasses understanding and communicating, on top of grammar? Conceptually the RL approach might be better, but that doesn't mean it's at all doable.

1

blazejd OP t1_ix8ibmh wrote

>For actually learning language, in the sense of using it to convey meaningful, appropriate information, which LLMs so far cannot do, maybe it's better to take an RL approach. But I don't know how to write a reward function that encompasses that. So as long as we can't do the superior thing with either approach, we might as well focus on the easier approach to the superficial thing.

My understanding of this paragraph simply put is (correct me if I'm wrong) "RL might be better, but we don't know how to do it, so let's not try. Language models are doing fine.".

In my opinion, in science we should focus simultaneously on easier problems that can lead to shorter-term gains (language models) AND ALSO more difficult problems that are riskier but might be better long term (RL-based).

1

idrajitsc t1_ix8lrk0 wrote

I mean, I'm not really sure what your ask is. People do work on RL for NLP. It just doesn't offer any huge advantage, and the reason your intuition doesn't translate to an actual advantage is because writing a reward function that reproduces the human feedback a baby receives is essentially impossible. And not just in a, it's hard but if we put enough work into it we can figure it out, kind of way.

2

blazejd OP t1_ix8k4wf wrote

>Sure, but the answer remains: what reward function do you use that encompasses understanding and communicating, on top of grammar?

I realize this doesn't directly answer your question, so might point is that we don't know the answer, but we should at least try to pursue it.

1