Viewing a single comment thread. View all comments

acutelychronicpanic t1_jecoprq wrote

I My mental model is based on this:

Approximate alignment will be much easier than perfect alignment. I think its achievable to have AI with superhuman insight and be well enough aligned that it would take deliberate prodding or jailbreaking to get it to model malicious action. I would argue that in many domains, GPT-4 already fits this description.

Regarding roughly equivalent models, I think that there is an exponential increase in intelligence required to take action in the world as you attempt to do more complicated things or act further into the future. My intuition is based on the complexity of predicting the future in chaotic systems. Society is one such system. I don't think 10x intelligence will necessarily lead to 10x increase in competence. I strongly suspect we underestimate the complexity of the world. This may buy us a lot of time by decreasing the peaks in the global intelligence landscape to the extent that humans utilizing narrow AI and proto-AGI may have a good chance.

I do know that regardless of if the AI alignment issue can be solved, the largest institutions currently working on AI are not well aligned with humanity as institutions. Especially the ones that would continue working despite a global effort to slow AI cannot be trusted.

I'm willing to read any resources you want to point me to, or any arguments you want to make. I'd rather be corrected if possible.

1