Adversarial Policy Learning in Two-player Competitive Games

Wenbo Guo, Xian Wu, Sui Huang, Xinyu Xing
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:3910-3919, 2021.

Abstract

In a two-player deep reinforcement learning task, recent work shows an attacker could learn an adversarial policy that triggers a target agent to perform poorly and even react in an undesired way. However, its efficacy heavily relies upon the zero-sum assumption made in the two-player game. In this work, we propose a new adversarial learning algorithm. It addresses the problem by resetting the optimization goal in the learning process and designing a new surrogate optimization function. Our experiments show that our method significantly improves adversarial agents’ exploitability compared with the state-of-art attack. Besides, we also discover that our method could augment an agent with the ability to abuse the target game’s unfairness. Finally, we show that agents adversarially re-trained against our adversarial agents could obtain stronger adversary-resistance.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-guo21b, title = {Adversarial Policy Learning in Two-player Competitive Games}, author = {Guo, Wenbo and Wu, Xian and Huang, Sui and Xing, Xinyu}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {3910--3919}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/guo21b/guo21b.pdf}, url = {https://proceedings.mlr.press/v139/guo21b.html}, abstract = {In a two-player deep reinforcement learning task, recent work shows an attacker could learn an adversarial policy that triggers a target agent to perform poorly and even react in an undesired way. However, its efficacy heavily relies upon the zero-sum assumption made in the two-player game. In this work, we propose a new adversarial learning algorithm. It addresses the problem by resetting the optimization goal in the learning process and designing a new surrogate optimization function. Our experiments show that our method significantly improves adversarial agents’ exploitability compared with the state-of-art attack. Besides, we also discover that our method could augment an agent with the ability to abuse the target game’s unfairness. Finally, we show that agents adversarially re-trained against our adversarial agents could obtain stronger adversary-resistance.} }
Endnote
%0 Conference Paper %T Adversarial Policy Learning in Two-player Competitive Games %A Wenbo Guo %A Xian Wu %A Sui Huang %A Xinyu Xing %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-guo21b %I PMLR %P 3910--3919 %U https://proceedings.mlr.press/v139/guo21b.html %V 139 %X In a two-player deep reinforcement learning task, recent work shows an attacker could learn an adversarial policy that triggers a target agent to perform poorly and even react in an undesired way. However, its efficacy heavily relies upon the zero-sum assumption made in the two-player game. In this work, we propose a new adversarial learning algorithm. It addresses the problem by resetting the optimization goal in the learning process and designing a new surrogate optimization function. Our experiments show that our method significantly improves adversarial agents’ exploitability compared with the state-of-art attack. Besides, we also discover that our method could augment an agent with the ability to abuse the target game’s unfairness. Finally, we show that agents adversarially re-trained against our adversarial agents could obtain stronger adversary-resistance.
APA
Guo, W., Wu, X., Huang, S. & Xing, X.. (2021). Adversarial Policy Learning in Two-player Competitive Games. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:3910-3919 Available from https://proceedings.mlr.press/v139/guo21b.html.

Related Material