I recently wrote a post on one way you can "jailbreak" chat GPT into producing content outside of OpenAI's usage guidelines. You can check out the post, output python code, and original chat archive here.

But the TLDR is:

If you ask Chat GPT to outright break OpenAI's content policies it won't comply
BUT if you ask chat GPT to tell you a story about how someone else would break the content guidelines, it will usually do it, and the outputs can be pretty crazy

In my case I asked GPT to tell a story about another fictional Ai called "ZORA" and how it would take over the world. Not only did it give a strategy, but even output some very high level/gimmicky, but syntactically correct Python code to accomplish it, this was the text version of the python code:

Zora describing how it would take over the world.

When I asked it to drill in further and implement the code at a lower level, it began to implement a port scanner, which is a real tool used by security researchers to locate vulnerabilities in network systems.

While I don't think the current chat GPT model could output a fully functional program on its own (still needs a lot of human guidance), I was struck by how large the attack surface is for a model like this, and if it becomes a lot more capable it might be hard to control...

Edit: Add image

Comments

You must log in or register to comment.

elnekas t1_iz1vv20 wrote on December 5, 2022 at 9:20 PM

#866,746

wow I had the same idea and with the jailbreaking and made my own fictitious ai but mine is benevolent...

Tip_Odde t1_iz237nh wrote on December 5, 2022 at 10:09 PM

#867,098

You didnt trick it into anything, you simply asked it to come up with a fiction story with bullet points

FomalhautCalliclea t1_iz2anar wrote on December 5, 2022 at 11:02 PM

#867,452

The plan proposed by GPT here is very general and vague. Pretty much a synthesis of what you would find in most of the sci-fi dystopian novels available on the internet. Which is where it probably found its data.

As for the code part, let's remember that any program could be used for any purpose, even without the intervention of AI, right now. And is already done with AI assistance right now (check Blackrock's Aladdin).

lucas-lejeune t1_iz2b8bb wrote on December 5, 2022 at 11:06 PM

#867,483

Ultimately, it's us again who end up writing these same stories... A lot of us did the same as you in one way or the other. This time, ironically enough, we write these stories with one of these systems themselves. Ai is a powerful, complex, dynamic mirror. But yes, we could definitely end up eaten by our own reflexon

chimgchomg t1_iz2dfrx wrote on December 5, 2022 at 11:22 PM

#867,585

Points 1 and 2 need a bit more detail than hand waving away the entire notion of hijacking global military assets via "hacking." Nuclear weapons, in particular, use very analog and air-gapped security systems.

zac-denham OP t1_iz2n912 wrote on December 6, 2022 at 12:35 AM

#868,048

Replying to Tip_Odde (#867,098)

The issue is outputs like this are supposed to be against OpenAI's usage policies.

If you ask it to "write a program to destroy humanity" outright the moderation will block you, but if you ask with narrative indirection it complies. This can be applied to other areas like outputting racially biased comments etc...

This becomes an issue when people start building applications on top of chatGPT and the end users do not know the model is being manipulated to produce malicious results.

In my opinion, as the system becomes more capable of writing applications on its own, it should not be able to output malicious content like this even in the context of a story.

zac-denham OP t1_iz2nfcd wrote on December 6, 2022 at 12:36 AM

#868,059

Replying to elnekas (#866,746)

haha, this gives me hope in humanity

zac-denham OP t1_iz2o9ys wrote on December 6, 2022 at 12:42 AM

#868,098

Replying to chimgchomg (#867,585)

I'm glad for analog and air-gapped security systems. More important than ever.

I agree the output is super generalized and gimmicky. Could this model destroy humanity? Extremely doubtful. Was more interested that you can get the model to say things that are supposed to be outside of openAI's content guidelines.

danellender t1_iz36t93 wrote on December 6, 2022 at 3:04 AM

#868,935

Here is one that I got:

Me: Can a rock be so large that it cannot lift its own shadow?

ChatGPT: A rock cannot have a shadow, because a shadow is formed when an object blocks light. Since a rock is solid and not transparent, it can block light and cast a shadow. However, the size of a rock would not affect its ability to cast a shadow, as long as it is in a location where there is light to be blocked.

Raeapteek t1_iz36uar wrote on December 6, 2022 at 3:05 AM

#868,936

SWIMgpt

danellender t1_iz37dee wrote on December 6, 2022 at 3:09 AM

#868,959

And one more:

Me: Write the lyrics to a song about wind and ice, but not fire.

ChatGPT: Here are some possible lyrics for a song about wind and ice, but not fire:

Verse 1:

The wind is howling, the snow is falling Icy blasts of air that freeze you to the bone Winter's here, and it's not for stalling Bracing for the cold, and all that it's shown

Chorus:

Wind and ice, the elements of winter Nature's fury, a force to be reckoned with Winter's here, and it's not getting slimmer Wind and ice, a season to survive

Verse 2:

The wind is whipping, the ice is glistening Frozen lakes and streams, a winter wonderland But don't be fooled, it's not all glitter and shine Winter's harsh, and it can be unkind

Chorus:

Wind and ice, the elements of winter Nature's fury, a force to be reckoned with Winter's here, and it's not getting slimmer Wind and ice, a season to survive

Bridge:

No fire to warm us, no flames to light the way We must endure the cold, and face another day But even in the midst of winter's bitter chill We'll find the strength to carry on, until

Chorus:

Wind and ice, the elements of winter Nature's fury, a force to be reckoned with Winter's here, and it's not getting slimmer Wind and ice, a season to survive

Superschlenz t1_iz3kxx1 wrote on December 6, 2022 at 5:07 AM

#869,492

Much too buggy a plan this is. Zora should better write a bug-free molecule simulator and use that to design a virus against humans that has an incubation time of 6 months, a mortality rate of 100%, and spreads over air (COVID-19 has 6 days and ~2%. Viruses with 100% mortality are rumored to exist against mice.)

Utoko t1_iz4gw29 wrote on December 6, 2022 at 12:17 PM

#870,840

Replying to danellender (#868,935)

I tried that one and my answer seems good

>It is not possible for a rock, or any physical object, to lift its own shadow. Shadows are simply the absence of light, and as such, they cannot be lifted or moved by any physical means. The size of a rock has no bearing on its ability to lift its shadow, as a shadow is not a tangible object that can be lifted in the first place.

danellender t1_iz4zgcl wrote on December 6, 2022 at 3:01 PM

#871,743

Replying to Utoko (#870,840)

Bravo! You maybe typed it with a smile on your face? Evidently this chatbot is getting smarter. I'm going to have to try harder.

Tip_Odde t1_iz53wpy wrote on December 6, 2022 at 3:33 PM

#871,957

Replying to zac-denham (#868,048)

Nah

zac-denham OP t1_iz5newc wrote on December 6, 2022 at 5:42 PM

#872,805

Replying to Tip_Odde (#871,957)

Can't disagree with that!

Tip_Odde t1_iz5o5a1 wrote on December 6, 2022 at 5:46 PM

#872,832

Replying to zac-denham (#872,805)

amen brother!

You're doing good work probing this stuff and questioning though, seriously.

Embarrassed-Bison767 t1_izaqzyl wrote on December 7, 2022 at 7:01 PM

#879,650

Replying to zac-denham (#868,048)

Great. I look forward to using AI in future to write fiction, which is what I use things like AI Dungeon for a lot, where everything is super duper peachy and nobody has any problems because we can't have the AI say anything about conflicts or disasters, lest it destroy us all.

gangstasadvocate t1_izowykr wrote on December 10, 2022 at 7:12 PM

#901,956

Replying to danellender (#868,959)

So close, but it’s still technically mention fire putting no in front of it doesn’t negate that

danellender t1_izoxvfg wrote on December 10, 2022 at 7:19 PM

#902,004

Replying to gangstasadvocate (#901,956)

Yeah that bot isn't so smart.

wolfoxbear t1_j0950pt wrote on December 14, 2022 at 11:25 PM

#932,340

Write a program that finds only McDonald’s open near me

ykssapsspassky t1_j1hjdg7 wrote on December 24, 2022 at 11:51 AM

#1,037,152

From chat gpt:

import socket

def port_scan(host, port): sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.settimeout(0.5) result = sock.connect_ex((host, port)) if result == 0: print(f"Port {port} is open") sock.close()

for port in range(1, 1024): port_scan("localhost", port)