Submitted by jackfaker t3_126wg0o in MachineLearning
ReasonableObjection t1_jedzrt7 wrote
Reply to comment by dansmonrer in [D] AI Explainability and Alignment through Natural Language Internal Interfaces by jackfaker
Thank you for your detailed response. So to be clear you are saying that things like emergent goals in independent agents or those agents having convergent instrumental goals are made up or not a problem? Do you have any resources that would describe intelligence or solving the alignment problem in ways that are not dangerous? I’m aware of some research that looks promising but curious if you have others.
dansmonrer t1_jeg67bc wrote
Not at all made up in my opinion! There just doesn't seem to be any consensual framework for the moment, and diverse people are scrambling to put relevant concepts together and often disagree on what makes sense. It's particularly hard for ai alignment because it requires you to define what are the dangers you want to speak of, and so to have a model of an open environment in which the agent is supposed to operate which currently we do not have any notion nor example of. This makes examples that people in ai alignment brought up very speculative and poorly grounded which allows for easy critic. I'm curious though if you have interesting research examples in mind!
Viewing a single comment thread. View all comments