Viewing a single comment thread. View all comments

genericrich t1_jef6keo wrote

Is it even possible to "align" a system, if you can't reliably understand what is happening inside it? How can you be sure it isn't deceiving you?

3

vivehelpme t1_jefdfv0 wrote

We can't align a hammer to not hit your fingers, or a human to not become a criminal. Thinking a dynamic multi-contexual system will somehow become a paragon saint is ridiculous.

And no matter how many alignment training sets you have it all goes out the window as soon as someone needs a military AI to kill people and ignore those sets.

5

Acalme-se_Satan t1_jefgy8g wrote

I certainly believe it's impossible to guarantee it's aligned for sure, but it's probably very possible to make it aligned 99.99% of the time with smart techniques to align them

2

Ambiwlans t1_jefp92b wrote

Sort of. We do understand what is happening internally more than you might think. And we could further develop that. Or better develop a secondary ai that is used to determine what the main ai is thinking.

0