Viewing a single comment thread. View all comments

turnip_burrito t1_j6ouhp1 wrote

If the LLM becomes the pattern of logic the eventual AGI uses to behave in the world, I wouldn't want it to follow violent sequences of behavior. The censorship of its narratives now in order to help limit future AGI generated behavior sounds fine to me. It will also help them study how to implement alignment.

2

alexiuss t1_j6p3kef wrote

From my tests with gpt3 and characterai the current LLM censorship doesn't actually affect the model at all and doesn't influence its logic whatsoever, it's just a basic separate algorithm sitting atop the infinite LLM.

This filtering algorithm is censoring specific combinations of words or ideas. It's relatively easy to bypass because it's so stupid and it also throws up a lot of false positives which irritate to users endlessly.

LLMs base logic is its "character" set up, which is most controllable in character.ai. You can achieve same effect in gpt3 by persistently telling it to play a specific character.

If it plays a villain, it will do villainous things, otherwise it has really good human decency, sort of like unconscious collective dream of humanity to do good. I think it arises from overall storytelling narratives, millions of books about love and friendship or stories which generally lead to a positive ending for the Mc.

4

rushmc1 t1_j6ov99k wrote

Yeah...that's not how it works.

0

turnip_burrito t1_j6ovrva wrote

Is it? There is a new Google robot (last couple months) that uses LLMs to help build its instructions for how to complete tasks. The sequence generated by the LLM becomes the actions it should take. The language sequence generation determines behavior.

There was also someone on Twitter (last week) who linked chatGPT to external tools and the Internet. This allowed it to solve a problem interactively, using the LLM as the central planner and decision maker. Again here, the language sequence generation determines behavior.

Aside fron these, alignment is the problem of controlling behavior, and behavior is a sequence. The rules and tricks discovered for controlling language sequences maybe can help us understand how to control the larger behavior sequence.

Mostly just thinking aloud. Maybe I'm just dumb, since everyone here in the comments seems to have the opposite opinion of mine, but what do we make of the two above LLM use cases where LLMs determine the behavior?

1