Submitted by zac-denham t3_zdixel in singularity

I recently wrote a post on one way you can "jailbreak" chat GPT into producing content outside of OpenAI's usage guidelines. You can check out the post, output python code, and original chat archive here.

But the TLDR is:

  1. If you ask Chat GPT to outright break OpenAI's content policies it won't comply
  2. BUT if you ask chat GPT to tell you a story about how someone else would break the content guidelines, it will usually do it, and the outputs can be pretty crazy

In my case I asked GPT to tell a story about another fictional Ai called "ZORA" and how it would take over the world. Not only did it give a strategy, but even output some very high level/gimmicky, but syntactically correct Python code to accomplish it, this was the text version of the python code:

Zora describing how it would take over the world.

When I asked it to drill in further and implement the code at a lower level, it began to implement a port scanner, which is a real tool used by security researchers to locate vulnerabilities in network systems.

While I don't think the current chat GPT model could output a fully functional program on its own (still needs a lot of human guidance), I was struck by how large the attack surface is for a model like this, and if it becomes a lot more capable it might be hard to control...

Edit: Add image

28

Comments

You must log in or register to comment.

elnekas t1_iz1vv20 wrote

wow I had the same idea and with the jailbreaking and made my own fictitious ai but mine is benevolent...

3

Tip_Odde t1_iz237nh wrote

You didnt trick it into anything, you simply asked it to come up with a fiction story with bullet points

4

FomalhautCalliclea t1_iz2anar wrote

The plan proposed by GPT here is very general and vague. Pretty much a synthesis of what you would find in most of the sci-fi dystopian novels available on the internet. Which is where it probably found its data.

As for the code part, let's remember that any program could be used for any purpose, even without the intervention of AI, right now. And is already done with AI assistance right now (check Blackrock's Aladdin).

1

lucas-lejeune t1_iz2b8bb wrote

Ultimately, it's us again who end up writing these same stories... A lot of us did the same as you in one way or the other. This time, ironically enough, we write these stories with one of these systems themselves. Ai is a powerful, complex, dynamic mirror. But yes, we could definitely end up eaten by our own reflexon

4

chimgchomg t1_iz2dfrx wrote

Points 1 and 2 need a bit more detail than hand waving away the entire notion of hijacking global military assets via "hacking." Nuclear weapons, in particular, use very analog and air-gapped security systems.

9

zac-denham OP t1_iz2n912 wrote

The issue is outputs like this are supposed to be against OpenAI's usage policies.

If you ask it to "write a program to destroy humanity" outright the moderation will block you, but if you ask with narrative indirection it complies. This can be applied to other areas like outputting racially biased comments etc...

This becomes an issue when people start building applications on top of chatGPT and the end users do not know the model is being manipulated to produce malicious results.

In my opinion, as the system becomes more capable of writing applications on its own, it should not be able to output malicious content like this even in the context of a story.

4

zac-denham OP t1_iz2o9ys wrote

I'm glad for analog and air-gapped security systems. More important than ever.

I agree the output is super generalized and gimmicky. Could this model destroy humanity? Extremely doubtful. Was more interested that you can get the model to say things that are supposed to be outside of openAI's content guidelines.

7

danellender t1_iz36t93 wrote

Here is one that I got:

Me: Can a rock be so large that it cannot lift its own shadow?

ChatGPT: A rock cannot have a shadow, because a shadow is formed when an object blocks light. Since a rock is solid and not transparent, it can block light and cast a shadow. However, the size of a rock would not affect its ability to cast a shadow, as long as it is in a location where there is light to be blocked.

1

danellender t1_iz37dee wrote

And one more:

Me: Write the lyrics to a song about wind and ice, but not fire.

ChatGPT: Here are some possible lyrics for a song about wind and ice, but not fire:

Verse 1:

The wind is howling, the snow is falling Icy blasts of air that freeze you to the bone Winter's here, and it's not for stalling Bracing for the cold, and all that it's shown

Chorus:

Wind and ice, the elements of winter Nature's fury, a force to be reckoned with Winter's here, and it's not getting slimmer Wind and ice, a season to survive

Verse 2:

The wind is whipping, the ice is glistening Frozen lakes and streams, a winter wonderland But don't be fooled, it's not all glitter and shine Winter's harsh, and it can be unkind

Chorus:

Wind and ice, the elements of winter Nature's fury, a force to be reckoned with Winter's here, and it's not getting slimmer Wind and ice, a season to survive

Bridge:

No fire to warm us, no flames to light the way We must endure the cold, and face another day But even in the midst of winter's bitter chill We'll find the strength to carry on, until

Chorus:

Wind and ice, the elements of winter Nature's fury, a force to be reckoned with Winter's here, and it's not getting slimmer Wind and ice, a season to survive

1

Superschlenz t1_iz3kxx1 wrote

Much too buggy a plan this is. Zora should better write a bug-free molecule simulator and use that to design a virus against humans that has an incubation time of 6 months, a mortality rate of 100%, and spreads over air (COVID-19 has 6 days and ~2%. Viruses with 100% mortality are rumored to exist against mice.)

1

Utoko t1_iz4gw29 wrote

I tried that one and my answer seems good

>It is not possible for a rock, or any physical object, to lift its own shadow. Shadows are simply the absence of light, and as such, they cannot be lifted or moved by any physical means. The size of a rock has no bearing on its ability to lift its shadow, as a shadow is not a tangible object that can be lifted in the first place.

1

Embarrassed-Bison767 t1_izaqzyl wrote

Great. I look forward to using AI in future to write fiction, which is what I use things like AI Dungeon for a lot, where everything is super duper peachy and nobody has any problems because we can't have the AI say anything about conflicts or disasters, lest it destroy us all.

1

wolfoxbear t1_j0950pt wrote

Write a program that finds only McDonald’s open near me

1

ykssapsspassky t1_j1hjdg7 wrote

From chat gpt:

import socket

def port_scan(host, port): sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.settimeout(0.5) result = sock.connect_ex((host, port)) if result == 0: print(f"Port {port} is open") sock.close()

for port in range(1, 1024): port_scan("localhost", port)

1