A Natural Actor-Critic Framework for Zero-Sum Markov Games

Ahmet Alacaoglu; Luca Viano; Niao He; Volkan Cevher

A Natural Actor-Critic Framework for Zero-Sum Markov Games

Ahmet Alacaoglu, Luca Viano, Niao He, Volkan Cevher

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:307-366, 2022.

Abstract

We introduce algorithms based on natural actor-critic and analyze their sample complexity for solving two player zero-sum Markov games in the tabular case. Our results improve the best-known sample complexities of policy gradient/actor-critic methods for convergence to Nash equilibrium in the multi-agent setting. We use the error propagation scheme in approximate dynamic programming, recent advances for global convergence of policy gradient methods, temporal difference learning, and techniques from stochastic primal-dual optimization. Our algorithms feature two stages, requiring agents to agree on an etiquette before starting their interactions, which is feasible for instance in self-play. However, the agents only access to joint reward and joint next state and not to each other’s actions or policies. Our complexity results match the best-known results for global convergence of policy gradient algorithms for single agent RL. We provide numerical verification of our methods for a two player bandit environment and a two player game, Alesia. We observe improved empirical performance as compared to the recently proposed optimistic gradient descent-ascent variant for Markov games.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-alacaoglu22a,
  title = 	 {A Natural Actor-Critic Framework for Zero-Sum {M}arkov Games},
  author =       {Alacaoglu, Ahmet and Viano, Luca and He, Niao and Cevher, Volkan},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {307--366},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/alacaoglu22a/alacaoglu22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/alacaoglu22a.html},
  abstract = 	 {We introduce algorithms based on natural actor-critic and analyze their sample complexity for solving two player zero-sum Markov games in the tabular case. Our results improve the best-known sample complexities of policy gradient/actor-critic methods for convergence to Nash equilibrium in the multi-agent setting. We use the error propagation scheme in approximate dynamic programming, recent advances for global convergence of policy gradient methods, temporal difference learning, and techniques from stochastic primal-dual optimization. Our algorithms feature two stages, requiring agents to agree on an etiquette before starting their interactions, which is feasible for instance in self-play. However, the agents only access to joint reward and joint next state and not to each other’s actions or policies. Our complexity results match the best-known results for global convergence of policy gradient algorithms for single agent RL. We provide numerical verification of our methods for a two player bandit environment and a two player game, Alesia. We observe improved empirical performance as compared to the recently proposed optimistic gradient descent-ascent variant for Markov games.}
}

Endnote

%0 Conference Paper
%T A Natural Actor-Critic Framework for Zero-Sum Markov Games
%A Ahmet Alacaoglu
%A Luca Viano
%A Niao He
%A Volkan Cevher
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-alacaoglu22a
%I PMLR
%P 307--366
%U https://proceedings.mlr.press/v162/alacaoglu22a.html
%V 162
%X We introduce algorithms based on natural actor-critic and analyze their sample complexity for solving two player zero-sum Markov games in the tabular case. Our results improve the best-known sample complexities of policy gradient/actor-critic methods for convergence to Nash equilibrium in the multi-agent setting. We use the error propagation scheme in approximate dynamic programming, recent advances for global convergence of policy gradient methods, temporal difference learning, and techniques from stochastic primal-dual optimization. Our algorithms feature two stages, requiring agents to agree on an etiquette before starting their interactions, which is feasible for instance in self-play. However, the agents only access to joint reward and joint next state and not to each other’s actions or policies. Our complexity results match the best-known results for global convergence of policy gradient algorithms for single agent RL. We provide numerical verification of our methods for a two player bandit environment and a two player game, Alesia. We observe improved empirical performance as compared to the recently proposed optimistic gradient descent-ascent variant for Markov games.

APA


Alacaoglu, A., Viano, L., He, N. & Cevher, V.. (2022). A Natural Actor-Critic Framework for Zero-Sum Markov Games. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:307-366 Available from https://proceedings.mlr.press/v162/alacaoglu22a.html.

Related Material

Download PDF