Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games

Maria-Florina Balcan, Rattana Pukdee, Pradeep Ravikumar, Hongyang Zhang
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:9607-9636, 2023.

Abstract

Adversarial training is a standard technique for training adversarially robust models. In this paper, we study adversarial training as an alternating best-response strategy in a 2-player zero-sum game. We prove that even in a simple scenario of a linear classifier and a statistical model that abstracts robust vs. non-robust features, the alternating best response strategy of such game may not converge. On the other hand, a unique pure Nash equilibrium of the game exists and is provably robust. We support our theoretical results with experiments, showing the non-convergence of adversarial training and the robustness of Nash equilibrium.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-balcan23a, title = {Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games}, author = {Balcan, Maria-Florina and Pukdee, Rattana and Ravikumar, Pradeep and Zhang, Hongyang}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {9607--9636}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/balcan23a/balcan23a.pdf}, url = {https://proceedings.mlr.press/v206/balcan23a.html}, abstract = {Adversarial training is a standard technique for training adversarially robust models. In this paper, we study adversarial training as an alternating best-response strategy in a 2-player zero-sum game. We prove that even in a simple scenario of a linear classifier and a statistical model that abstracts robust vs. non-robust features, the alternating best response strategy of such game may not converge. On the other hand, a unique pure Nash equilibrium of the game exists and is provably robust. We support our theoretical results with experiments, showing the non-convergence of adversarial training and the robustness of Nash equilibrium.} }
Endnote
%0 Conference Paper %T Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games %A Maria-Florina Balcan %A Rattana Pukdee %A Pradeep Ravikumar %A Hongyang Zhang %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-balcan23a %I PMLR %P 9607--9636 %U https://proceedings.mlr.press/v206/balcan23a.html %V 206 %X Adversarial training is a standard technique for training adversarially robust models. In this paper, we study adversarial training as an alternating best-response strategy in a 2-player zero-sum game. We prove that even in a simple scenario of a linear classifier and a statistical model that abstracts robust vs. non-robust features, the alternating best response strategy of such game may not converge. On the other hand, a unique pure Nash equilibrium of the game exists and is provably robust. We support our theoretical results with experiments, showing the non-convergence of adversarial training and the robustness of Nash equilibrium.
APA
Balcan, M., Pukdee, R., Ravikumar, P. & Zhang, H.. (2023). Nash Equilibria and Pitfalls of Adversarial Training in Adversarial Robustness Games. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:9607-9636 Available from https://proceedings.mlr.press/v206/balcan23a.html.

Related Material