Viewing a single comment thread. View all comments

currentscurrents t1_j9z82po wrote

Interesting. LLMs really need a better way to understand what instructions they should follow and what instructions they should ignore.

Neural network security is getting to be a whole subfield at this point. Adversarial attacks, training data poisoning, etc.

16

taken_every_username OP t1_j9zo17x wrote

Doesn't seem like there are any good mitigations right now and it affects pretty much all the useful use-cases for LLMs, even code completion...

3

currentscurrents t1_j9zwkw3 wrote

If I'm reading it right, it only works for LLMs that call an external source. Like Toolformer or Bing Chat. There's no way to inject it into ChatGPT or Github Copilot, it isn't a training data poisoning attack.

I think I remember somebody doing something like this against bing chat. They would give it a link to their blog, which contained the full prompt.

8

taken_every_username OP t1_j9zz7jc wrote

They mention code completion in the paper too. I guess yea chatgpt isn't really affected but sure seems like connecting them to stuff was the main future selling point

1