Viewing a single comment thread. View all comments

gwern t1_j9ff0ey wrote

> Only malicious questions will lead to malicious output.

That's not true, and has already been shown to be false by Sydney going off on users who seemed to doing harmless chats. You never know what it'll stochastically sample as a response.

Further, each time is different, as you really ought to know: the entire point of your technique is that at any time, Bing could refresh its search results (which search engines aspire to do in real time), and retrieve an entirely new set of results - any of which can prompt-inject Sydney to reprogram it to malicious output!

13

londons_explorer t1_j9ft1x2 wrote

> That's not true, and has already been shown to be false by Sydney going off on users who seemed to doing harmless chats.

The screenshoted chats never include the start... I suspect at the start of the conversation I suspect they said something to trigger this behaviour.

3

k___k___ t1_j9gyx7m wrote

this is also why Microsoft now limits the conversation depth to 5 interactions per session

1