Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games

Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Kentaro Toyoshima, Atsushi Iwasaki
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:7999-8028, 2023.

Abstract

This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings. In the former, players observe their exact gradient vectors of the utility functions. In the latter, they only observe the noisy gradient vectors. Even the celebrated Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms may not converge to a Nash equilibrium with noisy feedback. On the contrary, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both feedback settings. We then prove that it converges to an exact Nash equilibrium by iteratively adapting the mutation term. We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-abe23a, title = {Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games}, author = {Abe, Kenshi and Ariu, Kaito and Sakamoto, Mitsuki and Toyoshima, Kentaro and Iwasaki, Atsushi}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {7999--8028}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/abe23a/abe23a.pdf}, url = {https://proceedings.mlr.press/v206/abe23a.html}, abstract = {This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings. In the former, players observe their exact gradient vectors of the utility functions. In the latter, they only observe the noisy gradient vectors. Even the celebrated Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms may not converge to a Nash equilibrium with noisy feedback. On the contrary, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both feedback settings. We then prove that it converges to an exact Nash equilibrium by iteratively adapting the mutation term. We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.} }
Endnote
%0 Conference Paper %T Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games %A Kenshi Abe %A Kaito Ariu %A Mitsuki Sakamoto %A Kentaro Toyoshima %A Atsushi Iwasaki %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-abe23a %I PMLR %P 7999--8028 %U https://proceedings.mlr.press/v206/abe23a.html %V 206 %X This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings. In the former, players observe their exact gradient vectors of the utility functions. In the latter, they only observe the noisy gradient vectors. Even the celebrated Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms may not converge to a Nash equilibrium with noisy feedback. On the contrary, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both feedback settings. We then prove that it converges to an exact Nash equilibrium by iteratively adapting the mutation term. We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.
APA
Abe, K., Ariu, K., Sakamoto, M., Toyoshima, K. & Iwasaki, A.. (2023). Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:7999-8028 Available from https://proceedings.mlr.press/v206/abe23a.html.

Related Material