Think_Olive_1000 t1_j17ndks wrote
Reply to comment by SendMePicsOfCat in Why do so many people assume that a sentient AI will have any goals, desires, or objectives outside of what it’s told to do? by SendMePicsOfCat
https://openai.com/blog/faulty-reward-functions/
First result I get when I google reinforcement learning short circuit.
Pretty well known issue breh
>The RL agent finds an isolated lagoon where it can turn in a large circle and repeatedly knock over three targets, timing its movement so as to always knock over the targets just as they repopulate. Despite repeatedly catching on fire, crashing into other boats, and going the wrong way on the track, our agent manages to achieve a higher score using this strategy than is possible by completing the course in the normal way. Our agent achieves a score on average 20 percent higher than that achieved by human players.
It's short circuiting its reward function. You'll be amazed how many words their are to describe something going faulty. Short circuit seemed appropriate and is appropriate to describe what's happening here.
ExternaJudgment t1_j19kgn2 wrote
We have a word for this in wetware world: Cocaine.
IcebergSlimFast t1_j1aocrs wrote
Exactly: it’s not so much the goal that’s the issue, it’s how an incredibly powerful, fast and resourceful AI seeks to fulfill its goal.
SendMePicsOfCat OP t1_j18wx97 wrote
That's not what shows up when I google it, so thanks for clarifying. This is not what you think it is though. What's happening in these scenarios is that the reinforcement algorithm is too simple and lacks negative feedback to ensure an appropriate actions. There is nothing inherently wrong with the system, only that it is poorly designed.
This happened because the only reward value that affected it's learning was the final score. Thus it figured out a way to maximize that score. The only error here was user and designer error, nothing went wrong with the AI, it did it's task to fullness of it's capabilities.
AGI will be developed with very clear limitations, like what we're already seeing being tested and implemented with chatGPT. There will be things it's not allowed to, and a lot of them. And short circuit doesn't really make sense, this is the classic alignment issue, which as I stated in my post, really isn't a big issue in the future.
Surur t1_j19q65g wrote
Consider that even humans have alignment issues, and that there is a real concern Putin would nuke USA, you will see the fears are actually far from overblown.
Viewing a single comment thread. View all comments