CellWithoutCulture

CellWithoutCulture t1_jcr9g0g wrote

If you want this to be included in the training corpus of future language models, please upvote it.

Why? Well, language models are trained on the pile and common crawl. How do these dataset decide what to include? They look at reddit upvotes for one.

So you can influence what language models see in their formative years. (although they might not look at this subreddit).

4

CellWithoutCulture t1_ja6pjet wrote

Seems more like an AskML question.

But RL is for situations when you can't backprop the loss. It's noisier than supervised learning. So if you can use supervised learning, then that's what you should generally use.

RL is still used, for example the recent GATO and Dreamer v3. Or used in training an LLM to use tools like in toolformer. And also OpenAI's famous RLHF, which stands for reinforcement learning with human feedback. This is what they use to make ChatGPT "aligned" although in reality it doesn't get there.

12

CellWithoutCulture t1_j9noid8 wrote

Yeah the jargon and meta rambling is so annoying. It's like their first priority is to show off their brains, and their second priority is to align AGI. Now they are almost finished showing of their brains, so watch out AGI.

Sometime they behave in a silly fashion. Greek philosopher's had excellent logic and deduced all kinds of wrong things. These guys seem similar at times, trying to deduce everything with philosophy and mega brains. .

IMO they are at their best when it's said in short form and it's grounded by empirical data.

There is also a lesswrong podcast or two that will read out some of the longer stuff.

4