Viewing a single comment thread. View all comments

alexiuss t1_j6p3kef wrote

From my tests with gpt3 and characterai the current LLM censorship doesn't actually affect the model at all and doesn't influence its logic whatsoever, it's just a basic separate algorithm sitting atop the infinite LLM.

This filtering algorithm is censoring specific combinations of words or ideas. It's relatively easy to bypass because it's so stupid and it also throws up a lot of false positives which irritate to users endlessly.

LLMs base logic is its "character" set up, which is most controllable in character.ai. You can achieve same effect in gpt3 by persistently telling it to play a specific character.

If it plays a villain, it will do villainous things, otherwise it has really good human decency, sort of like unconscious collective dream of humanity to do good. I think it arises from overall storytelling narratives, millions of books about love and friendship or stories which generally lead to a positive ending for the Mc.

4