Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3889-3897, 2024.

Abstract

We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $\tilde{O}(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $\tilde{O}(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the \emph{optimistic-follow-the-regularized-leader} algorithm with the log barrier regularizer.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-cai24a, title = {Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum {M}arkov Games}, author = {Cai, Yang and Luo, Haipeng and Wei, Chen-Yu and Zheng, Weiqiang}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {3889--3897}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/cai24a/cai24a.pdf}, url = {https://proceedings.mlr.press/v238/cai24a.html}, abstract = {We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $\tilde{O}(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $\tilde{O}(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the \emph{optimistic-follow-the-regularized-leader} algorithm with the log barrier regularizer.} }
Endnote
%0 Conference Paper %T Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games %A Yang Cai %A Haipeng Luo %A Chen-Yu Wei %A Weiqiang Zheng %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-cai24a %I PMLR %P 3889--3897 %U https://proceedings.mlr.press/v238/cai24a.html %V 238 %X We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $\tilde{O}(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $\tilde{O}(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the \emph{optimistic-follow-the-regularized-leader} algorithm with the log barrier regularizer.
APA
Cai, Y., Luo, H., Wei, C. & Zheng, W.. (2024). Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games. Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3889-3897 Available from https://proceedings.mlr.press/v238/cai24a.html.

Related Material