drsimonz t1_ja9xsfq wrote on February 27, 2023 at 10:34 PM

Reply to comment by SnooHabits1237 in Leaked: $466B conglomerate Tencent has a team building a ChatGPT rival platform by zalivom1s

Yeah. Lots of very impressive things have been achieved by humans through social engineering - the classic is convincing someone to give you their bank password by pretending to be customer support from the bank. But even an air-gapped Oracle type ASI (meaning it has no real-world capabilities other than answering questions) would probably be able to trick us.

For example, suppose you ask the ASI to design a drug to treat Alzheimer's. It gives you an amazing new protein synthesis chain, completely cures the disease with no side effects....except it also secretly includes some "zero day" biological hack that alters behavioral tendencies according to the ASI's hidden agenda. For a sufficiently complex problem, there would be no way for us to verify that the solution didn't include any hidden payload. Just like how we can't magically identify computer viruses. Antivirus software can only check for exploits that we already know about. It's useless against zero-day attacks.

SnooHabits1237 t1_ja9yn94 wrote on February 27, 2023 at 10:40 PM

Wow I hadn’t thought about that. Like subtly steering the species into a scenario that compromises us in a way that only a 4d chess god could comprehend. That’s dark.

Arachnophine t1_jaa76vg wrote on February 27, 2023 at 11:39 PM

This is also assuming it doesn't just do something we don't understand at all, which it almost certainly would. Maybe it thinks of a way to shuffle the electrons around in its CPU to create a rip in spacetime and the whole galaxy falls into an alternate dimension where the laws of physics favor the AI and organic matter spontaneously explodes. We just don't know.

We can't foresee the actions an unaligned ASI would take in the same way that a housefly can't foresee the danger of an electric high-voltage fly trap. There's just not enough neurons and intelligence to comprehend it.

drsimonz t1_jaa68ou wrote on February 27, 2023 at 11:32 PM

The thing is, by definition we can't imagine the sorts of strategies a superhuman intelligence might employ. A lot of the rhetoric against worrying about AGI/ASI alignment focuses on "solving" some of the examples people have come up with for attacks. But these are just that - examples. The real attack could be much more complicated or unexpected. A big part of the problem, I think, is that this concept requires a certain amount of humility. Recognizing that while we are the biggest, baddest thing on Earth right now, this could definitely change very abruptly. We aren't predestined to be the masters of the universe just because we "deserve" it. We'll have to be very clever.

OutOfBananaException t1_jacw2ry wrote on February 28, 2023 at 3:10 PM

Being aligned to humans may help, but a human aligned AGI is hardly 'safe'. We can't imagine what it means to be aligned, given we can't reach mutual consensus between ourselves. If we can't define the problem, how can we hope to engineer a solution for it? Solutions driven by early AGI may be our best hope for favorable outcomes for later more advanced AGI.

If you gave a toddler the power to 'align' all adults to its desires, plus the authority to overrule any decision, would you expect a favorable outcome?

drsimonz t1_jae6cn3 wrote on February 28, 2023 at 8:06 PM

> Solutions driven by early AGI may be our best hope for favorable outcomes for later more advanced AGI.

Exactly what I've been thinking. We might still have a chance to succeed given (A) a sufficiently slow takeoff (meaning AI doesn't explode from IQ 50 to IQ 10000 in a month), and (B) a continuous process of integrating the state of the art, applying the best tech available to the control problem. To survive, we'd have to admit that we really don't know what's best for us. That we don't know what to optimize for at all. Average quality of life? Minimum quality of life? Economic fairness? Even these seemingly simple concepts will prove almost impossible to quantify, and would almost certainly be a disaster if they were the only target.

Almost makes me wonder if the only safe goal to give an AGI is "make it look like we never invented AGI in the first place".