genericrich t1_jef6keo wrote on March 31, 2023 at 3:53 PM

Is it even possible to "align" a system, if you can't reliably understand what is happening inside it? How can you be sure it isn't deceiving you?

vivehelpme t1_jefdfv0 wrote on March 31, 2023 at 4:37 PM

We can't align a hammer to not hit your fingers, or a human to not become a criminal. Thinking a dynamic multi-contexual system will somehow become a paragon saint is ridiculous.

And no matter how many alignment training sets you have it all goes out the window as soon as someone needs a military AI to kill people and ignore those sets.

Acalme-se_Satan t1_jefgy8g wrote on March 31, 2023 at 5:00 PM

I certainly believe it's impossible to guarantee it's aligned for sure, but it's probably very possible to make it aligned 99.99% of the time with smart techniques to align them

Ambiwlans t1_jefp92b wrote on March 31, 2023 at 5:54 PM

Sort of. We do understand what is happening internally more than you might think. And we could further develop that. Or better develop a secondary ai that is used to determine what the main ai is thinking.