$\widetilde{O}(T^{-1})$ Convergence to (coarse) correlated equilibria in full-information general-sum Markov games

Weichao Mao, Haoran Qiu, Chen Wang, Hubertus Franke, Zbigniew Kalbarczyk, Tamer Başar
Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:361-374, 2024.

Abstract

No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T^{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T^{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v242-mao24a, title = {$\widetilde{O}(T^{-1})$ {C}onvergence to (coarse) correlated equilibria in full-information general-sum markov games}, author = {Mao, Weichao and Qiu, Haoran and Wang, Chen and Franke, Hubertus and Kalbarczyk, Zbigniew and Ba\c{s}ar, Tamer}, booktitle = {Proceedings of the 6th Annual Learning for Dynamics & Control Conference}, pages = {361--374}, year = {2024}, editor = {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis}, volume = {242}, series = {Proceedings of Machine Learning Research}, month = {15--17 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v242/mao24a/mao24a.pdf}, url = {https://proceedings.mlr.press/v242/mao24a.html}, abstract = {No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T^{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T^{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings.} }
Endnote
%0 Conference Paper %T $\widetilde{O}(T^{-1})$ Convergence to (coarse) correlated equilibria in full-information general-sum Markov games %A Weichao Mao %A Haoran Qiu %A Chen Wang %A Hubertus Franke %A Zbigniew Kalbarczyk %A Tamer Başar %B Proceedings of the 6th Annual Learning for Dynamics & Control Conference %C Proceedings of Machine Learning Research %D 2024 %E Alessandro Abate %E Mark Cannon %E Kostas Margellos %E Antonis Papachristodoulou %F pmlr-v242-mao24a %I PMLR %P 361--374 %U https://proceedings.mlr.press/v242/mao24a.html %V 242 %X No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T^{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T^{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings.
APA
Mao, W., Qiu, H., Wang, C., Franke, H., Kalbarczyk, Z. & Başar, T.. (2024). $\widetilde{O}(T^{-1})$ Convergence to (coarse) correlated equilibria in full-information general-sum Markov games. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:361-374 Available from https://proceedings.mlr.press/v242/mao24a.html.

Related Material