Submitted by xutw21 t3_yjryrd in MachineLearning
ARGleave t1_iutmvdj wrote
Reply to comment by KellinPelrine in [N] Adversarial Policies Beat Professional-Level Go AIs by xutw21
>Or if its estimated value is off from what it should be. Perhaps for some reason it learns to play on the edge, so to speak, by throwing parts of its territory away when it doesn't need it to still win, and that leads to the lack of robustness here where it throws away territory it really does need.
That's quite possible -- although it learns to predict the score as an auxiliary head, the value function being optimized is the predicted win rate, so if it thinks it's very ahead on score it would be happy to sacrifice some points to get what it thinks is a surer win. Notably the victim's value function (predicted win rate) is usually >99.9% even on the penultimate move where it passes and has effectively thrown the game.
Viewing a single comment thread. View all comments