Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games

Jing Dong, Baoxiang Wang, Yaoliang Yu
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:2044-2052, 2024.

Abstract

In this work, we study potential games and Markov potential games under stochastic cost and bandit feedback. We propose a variant of the Frank-Wolfe algorithm with sufficient exploration and recursive gradient estimation, which provably converges to the Nash equilibrium while attaining sublinear regret for each individual player. Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T^{4/5})$ for potential games, which matches the best available result, without using additional projection steps. Through carefully balancing the reuse of past samples and exploration of new samples, we then extend the results to Markov potential games and improve the best available Nash regret from $O(T^{5/6})$ to $O(T^{4/5})$. Moreover, our algorithm requires no knowledge of the game, such as the distribution mismatch coefficient, which provides more flexibility in its practical implementation. Experimental results corroborate our theoretical findings and underscore the practical effectiveness of our method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-dong24a, title = {Convergence to {N}ash Equilibrium and No-regret Guarantee in ({M}arkov) Potential Games}, author = {Dong, Jing and Wang, Baoxiang and Yu, Yaoliang}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {2044--2052}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/dong24a/dong24a.pdf}, url = {https://proceedings.mlr.press/v238/dong24a.html}, abstract = {In this work, we study potential games and Markov potential games under stochastic cost and bandit feedback. We propose a variant of the Frank-Wolfe algorithm with sufficient exploration and recursive gradient estimation, which provably converges to the Nash equilibrium while attaining sublinear regret for each individual player. Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T^{4/5})$ for potential games, which matches the best available result, without using additional projection steps. Through carefully balancing the reuse of past samples and exploration of new samples, we then extend the results to Markov potential games and improve the best available Nash regret from $O(T^{5/6})$ to $O(T^{4/5})$. Moreover, our algorithm requires no knowledge of the game, such as the distribution mismatch coefficient, which provides more flexibility in its practical implementation. Experimental results corroborate our theoretical findings and underscore the practical effectiveness of our method.} }
Endnote
%0 Conference Paper %T Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games %A Jing Dong %A Baoxiang Wang %A Yaoliang Yu %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-dong24a %I PMLR %P 2044--2052 %U https://proceedings.mlr.press/v238/dong24a.html %V 238 %X In this work, we study potential games and Markov potential games under stochastic cost and bandit feedback. We propose a variant of the Frank-Wolfe algorithm with sufficient exploration and recursive gradient estimation, which provably converges to the Nash equilibrium while attaining sublinear regret for each individual player. Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T^{4/5})$ for potential games, which matches the best available result, without using additional projection steps. Through carefully balancing the reuse of past samples and exploration of new samples, we then extend the results to Markov potential games and improve the best available Nash regret from $O(T^{5/6})$ to $O(T^{4/5})$. Moreover, our algorithm requires no knowledge of the game, such as the distribution mismatch coefficient, which provides more flexibility in its practical implementation. Experimental results corroborate our theoretical findings and underscore the practical effectiveness of our method.
APA
Dong, J., Wang, B. & Yu, Y.. (2024). Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:2044-2052 Available from https://proceedings.mlr.press/v238/dong24a.html.

Related Material