Viewing a single comment thread. View all comments

AllEndsAreAnds t1_j4997rv wrote

Those are two totally different AI architectures though. You can’t sweep from large language models into reinforcement learning agents and assume some kind of continuity.

Alignment and morals are not bloatware in a large language model, because the training data is human writings. The value we want to extract has to be greater than the negative impact that it is capable of generating, so it’s prudent to prune off some roads in pursuit of a stable and valuable product to sell.

In a reinforcement model like alpha-zero, the training data is previous versions of itself. It has no need for morals because it doesn’t operate on a moral landscape. That’s not to say that we wont ultimately want reinforcement agents in a moral landscape - we will - but these agents, too, will be trained within a social and moral landscape where alignment is necessary to accomplish goals.

As a society, we can afford bloatware. We likely cannot afford the alternative.

4