Submitted by taken_every_username t3_11bkpu3 in MachineLearning
currentscurrents t1_j9z82po wrote
Interesting. LLMs really need a better way to understand what instructions they should follow and what instructions they should ignore.
Neural network security is getting to be a whole subfield at this point. Adversarial attacks, training data poisoning, etc.
taken_every_username OP t1_j9zo17x wrote
Doesn't seem like there are any good mitigations right now and it affects pretty much all the useful use-cases for LLMs, even code completion...
currentscurrents t1_j9zwkw3 wrote
If I'm reading it right, it only works for LLMs that call an external source. Like Toolformer or Bing Chat. There's no way to inject it into ChatGPT or Github Copilot, it isn't a training data poisoning attack.
I think I remember somebody doing something like this against bing chat. They would give it a link to their blog, which contained the full prompt.
taken_every_username OP t1_j9zz7jc wrote
They mention code completion in the paper too. I guess yea chatgpt isn't really affected but sure seems like connecting them to stuff was the main future selling point
Viewing a single comment thread. View all comments