Liberty2012 OP t1_jadnx3q wrote on February 28, 2023 at 6:10 PM

Reply to comment by phaedrux_pharo in Is the intelligence paradox resolvable? by Liberty2012

Certainly there is a spectrum of behavior for which we would deem allowable or not allowable. However, that in itself is an ambiguous set of rules or heuristics for which there is no clear boundary and presents the risk of leaks of control due to not well defined limits.

However, for whatever behavior we set within the unallowable, that must be protected such that it can not be self modified by the AGI. By what mechanism do we think that will be achievable?

phaedrux_pharo t1_jadrn73 wrote on February 28, 2023 at 6:33 PM

>By what mechanism do we think that will be achievable?

By "correctly" setting up the basic incentives, and/or integration with biological human substrates. Some ambiguity is unavoidable, some risk is unavoidable. One way to approach the issue is from the opposite direction:

What do we not do? Well, let's not create systems whose goals are to deliberately extinguish life on earth. Let's not create torture bots, let's not create systems that are "obviously" misaligned.

Unfortunately I'm afraid we've already done so. It's a tough problem.

The only solution I'm completely on board with is everyone ceding total control to my particular set of ethics and allowing me to become a singular bio-ASI god-king, but that seems unlikely.

Ultimately I doubt the alarms being raised by alignment folks are going to have much effect. Entities with a monopoly on violence are existentially committed to those monopolies, and I suspect they will be the ones to instantiate some of the first ASIs - with obvious goals in mind. So the question of alignment is kind of a red herring to me, since purposefully un-aligned systems will probably be developed first anyway.

Liberty2012 OP t1_jadt7dr wrote on February 28, 2023 at 6:43 PM

Yes, I think you nailed it here with this response. That aligns very closely with what I've called the Bias Paradox. Essentially humanity can not escape its own flaws through the creation of AI.

We will inevitably end up encoding our own flaws back into the system by one manner or another. It is like a feedback loop from which we can not escape.

I believe ultimately there is a very stark contrast of what visions people have of what "could be" versus the reality of what "will be".

I elaborate more thoroughly here FYI - https://dakara.substack.com/p/ai-the-bias-paradox