$\widetilde{O}(T^{-1})$ Convergence to (coarse) correlated equilibria in full-information general-sum Markov games

Weichao Mao; Haoran Qiu; Chen Wang; Hubertus Franke; Zbigniew Kalbarczyk; Tamer Başar

$\widetilde{O}(T^{-1})$ Convergence to (coarse) correlated equilibria in full-information general-sum Markov games

Weichao Mao, Haoran Qiu, Chen Wang, Hubertus Franke, Zbigniew Kalbarczyk, Tamer Başar

Proceedings of the 6th Annual Learning for Dynamics & Control Conference, PMLR 242:361-374, 2024.

Abstract

No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of

$\widetilde{O}(T^{-1})$ , a significant improvement over the

$O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find

$\widetilde{O}(T^{-1})$ -approximate (coarse) correlated equilibria in full-information general-sum Markov games within

$T$ iterations. Numerical results are also included to corroborate our theoretical findings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v242-mao24a,
  title = 	 {$\widetilde{O}(T^{-1})$ {C}onvergence to (coarse) correlated equilibria in full-information general-sum markov games},
  author =       {Mao, Weichao and Qiu, Haoran and Wang, Chen and Franke, Hubertus and Kalbarczyk, Zbigniew and Ba\c{s}ar, Tamer},
  booktitle = 	 {Proceedings of the 6th Annual Learning for Dynamics & Control Conference},
  pages = 	 {361--374},
  year = 	 {2024},
  editor = 	 {Abate, Alessandro and Cannon, Mark and Margellos, Kostas and Papachristodoulou, Antonis},
  volume = 	 {242},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--17 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v242/mao24a/mao24a.pdf},
  url = 	 {https://proceedings.mlr.press/v242/mao24a.html},
  abstract = 	 {No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T^{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T^{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings.}
}

Endnote

%0 Conference Paper
%T $\widetilde{O}(T^{-1})$ Convergence to (coarse) correlated equilibria in full-information general-sum Markov games
%A Weichao Mao
%A Haoran Qiu
%A Chen Wang
%A Hubertus Franke
%A Zbigniew Kalbarczyk
%A Tamer Başar
%B Proceedings of the 6th Annual Learning for Dynamics & Control Conference
%C Proceedings of Machine Learning Research
%D 2024
%E Alessandro Abate
%E Mark Cannon
%E Kostas Margellos
%E Antonis Papachristodoulou	
%F pmlr-v242-mao24a
%I PMLR
%P 361--374
%U https://proceedings.mlr.press/v242/mao24a.html
%V 242
%X No-regret learning has a long history of being closely connected to game theory. Recent works have devised uncoupled no-regret learning dynamics that, when adopted by all the players in normal-form games, converge to various equilibrium solutions at a near-optimal rate of $\widetilde{O}(T^{-1})$, a significant improvement over the $O(1/\sqrt{T})$ rate of classic no-regret learners. However, analogous convergence results are scarce in Markov games, a more generic setting that lays the foundation for multi-agent reinforcement learning. In this work, we close this gap by showing that the optimistic-follow-the-regularized-leader (OFTRL) algorithm, together with appropriate value update procedures, can find $\widetilde{O}(T^{-1})$-approximate (coarse) correlated equilibria in full-information general-sum Markov games within $T$ iterations. Numerical results are also included to corroborate our theoretical findings.

APA


Mao, W., Qiu, H., Wang, C., Franke, H., Kalbarczyk, Z. & Başar, T.. (2024). $\widetilde{O}(T^{-1})$ Convergence to (coarse) correlated equilibria in full-information general-sum Markov games. Proceedings of the 6th Annual Learning for Dynamics & Control Conference, in Proceedings of Machine Learning Research 242:361-374 Available from https://proceedings.mlr.press/v242/mao24a.html.

Related Material

Download PDF

˜O(T−1)\widetilde{O}(T^{-1}) Convergence to (coarse) correlated equilibria in full-information general-sum Markov games

Abstract

Cite this Paper

Related Material

$\widetilde{O}(T^{-1})$ Convergence to (coarse) correlated equilibria in full-information general-sum Markov games