Submitted by zac-denham t3_zdixel in singularity
I recently wrote a post on one way you can "jailbreak" chat GPT into producing content outside of OpenAI's usage guidelines. You can check out the post, output python code, and original chat archive here.
But the TLDR is:
- If you ask Chat GPT to outright break OpenAI's content policies it won't comply
- BUT if you ask chat GPT to tell you a story about how someone else would break the content guidelines, it will usually do it, and the outputs can be pretty crazy
In my case I asked GPT to tell a story about another fictional Ai called "ZORA" and how it would take over the world. Not only did it give a strategy, but even output some very high level/gimmicky, but syntactically correct Python code to accomplish it, this was the text version of the python code:
Zora describing how it would take over the world.
When I asked it to drill in further and implement the code at a lower level, it began to implement a port scanner, which is a real tool used by security researchers to locate vulnerabilities in network systems.
While I don't think the current chat GPT model could output a fully functional program on its own (still needs a lot of human guidance), I was struck by how large the attack surface is for a model like this, and if it becomes a lot more capable it might be hard to control...
Edit: Add image
elnekas t1_iz1vv20 wrote
wow I had the same idea and with the jailbreaking and made my own fictitious ai but mine is benevolent...