artsybashev t1_j5fh1m8 wrote on January 22, 2023 at 5:15 PM

I wonder if in 50 years, the LLM models are able to produce "viruses" that cause problems in competing models. Like AI hacking the other AI through injecting disruptive training data to the enemy training procedure.

EmmyNoetherRing t1_j5fhpjq wrote on January 22, 2023 at 5:19 PM

There’s almost an analogy here to malicious influence attacks aimed at radicalizing people. You have to inundate them with a web of targeted information/logic to gradually change their worldview.

artsybashev t1_j5fhyex wrote on January 22, 2023 at 5:21 PM

Infowars gets a new meaning in 10 years

hiptobecubic t1_j5jaw2x wrote on January 23, 2023 at 12:14 PM

It's the same meaning.

artsybashev t1_j5jj7fi wrote on January 23, 2023 at 1:35 PM

no

hiptobecubic t1_j5nfprv wrote on January 24, 2023 at 6:04 AM

.. yes?

ardula99 t1_j5fsdsw wrote on January 22, 2023 at 6:27 PM

That is what adversarial data points are -- people have discovered that it is possible to "confuse" image models using attacks such as these. Take a normal picture of cat, feed it into an image model and it'll label it correctly and say - hey, I'm 97% sure there's a cat in this. Change a small number of pixels using some algorithm (say <1% of the entire image) - to a human, it will still look like a cat, but an image model now thinks it's a stop sign (or something equally unlikely) with 90%+ probability.

EmmyNoetherRing t1_j5g8ogy wrote on January 22, 2023 at 8:12 PM

So, not quite. You’re describing funny cases that a trained classifier will misclassify.

We’re talking about what happens if you can intentionally inject bias into an AI’s training data (since it’s pulling that data from the web, if you know where it’s pulling from you can theoretically influence how it’s trained). That would potentially cause it to misclassify many cases (or have other more complex issues). It starts to be weirdly slightly feasible if you think about a future where a lot of online content is generated by AI— but we have at least two competing companies/governments supplying those AI.

Say we’ve got two AI’s, A & B. A can use secret proprietary watermarks to recognize its own text online and avoid using that text in its training data (it wants to train on human data). And of course AI B can do the same thing, to recognize its own text. But since each AI is using its own secret watermarks, there’s no good way to prevent A from accidentally training on B’s output. And vice versa.

The AI’s are supposed to only train on human data, to be more like humans. But maybe there will be a point where they unavoidably start training on each other. And then if there’s a malicious actor, they might intentionally use their AI to flood a popular public text data source with content that, if the other AI ingest it, will cause them to behave in a way that the actor wants (biased against their targets, or biased positively for the actor).

Effectively, at some point we may have to deal with people secretly using AI to advertise to, radicalize, or scam other AI. Unless we get some fairly global regulations up in time. Should be interesting.

I wonder to what extent we’ll manage to get science fiction out about these things before we start seeing them in practice.

ISvengali t1_j5hlozp wrote on January 23, 2023 at 1:34 AM

> I wonder to what extent we’ll manage to get science fiction out about these things before we start seeing them in practice.

Its not an exact match, but reminds me quite a lot of Snow Crash

e-rexter t1_j5i49g4 wrote on January 23, 2023 at 3:49 AM

Great book. Required reading back in the mid 90s when I worked at WIRED.

e-rexter t1_j5i42p1 wrote on January 23, 2023 at 3:48 AM

Reminds me of the movie multiplicity, in which each copy gets dumber.

Mushroom_Philatelist t1_j5gw4zt wrote on January 22, 2023 at 10:40 PM

https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/README.md

watchsnob t1_j5fl99s wrote on January 22, 2023 at 5:42 PM

couldn't they just log all of their own outputs and check against it?

Advanced-Hedgehog-95 t1_j5fmjo5 wrote on January 22, 2023 at 5:50 PM

That's not a practical option for obvious reasons

ardula99 t1_j5fsj2m wrote on January 22, 2023 at 6:28 PM

Not scalabale long term, especially once they start selling to clients. Clients will have privacy and security issues with OpenAI having full access (and logging history) of all their previous queries.

Acceptable-Cress-374 t1_j5ivlhe wrote on January 23, 2023 at 8:50 AM

> You don’t want the thing talking to itself.

Heh, I was thinking about this the other day. Do you think there's a world where LLMs can become better by "self-play" a la AlphaZero? Would it converge to understandable language or would it diverge into babllbe-speak?

[D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models?

Advanced-Hedgehog-95 t1_j5es27r wrote on January 22, 2023 at 2:13 PM

EmmyNoetherRing t1_j5ffiqv wrote on January 22, 2023 at 5:05 PM