gwern t1_j9ff0ey wrote
Reply to comment by KakaTraining in [D] Maybe a new prompt injection method against newBing or ChatGPT? Is this kind of research worth writing a paper? by KakaTraining
> Only malicious questions will lead to malicious output.
That's not true, and has already been shown to be false by Sydney going off on users who seemed to doing harmless chats. You never know what it'll stochastically sample as a response.
Further, each time is different, as you really ought to know: the entire point of your technique is that at any time, Bing could refresh its search results (which search engines aspire to do in real time), and retrieve an entirely new set of results - any of which can prompt-inject Sydney to reprogram it to malicious output!
londons_explorer t1_j9ft1x2 wrote
> That's not true, and has already been shown to be false by Sydney going off on users who seemed to doing harmless chats.
The screenshoted chats never include the start... I suspect at the start of the conversation I suspect they said something to trigger this behaviour.
k___k___ t1_j9gyx7m wrote
this is also why Microsoft now limits the conversation depth to 5 interactions per session
Viewing a single comment thread. View all comments