Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

Yulai Zhao; Yuandong Tian; Jason Lee; Simon Du

Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games

Yulai Zhao, Yuandong Tian, Jason Lee, Simon Du

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:2736-2761, 2022.

Abstract

Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such algorithms. We present a new policy optimization algorithm with function approximation and prove that under standard regularity conditions on the Markov game and the function approximation class, our algorithm finds a near-optimal policy within a polynomial number of samples and iterations. To our knowledge, this is the first provably efficient policy optimization algorithm with function approximation that solves two-player zero-sum Markov games.

Cite this Paper

BibTeX


@InProceedings{pmlr-v151-zhao22b,
  title = 	 { Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games },
  author =       {Zhao, Yulai and Tian, Yuandong and Lee, Jason and Du, Simon},
  booktitle = 	 {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2736--2761},
  year = 	 {2022},
  editor = 	 {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},
  volume = 	 {151},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {28--30 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v151/zhao22b/zhao22b.pdf},
  url = 	 {https://proceedings.mlr.press/v151/zhao22b.html},
  abstract = 	 { Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such algorithms. We present a new policy optimization algorithm with function approximation and prove that under standard regularity conditions on the Markov game and the function approximation class, our algorithm finds a near-optimal policy within a polynomial number of samples and iterations. To our knowledge, this is the first provably efficient policy optimization algorithm with function approximation that solves two-player zero-sum Markov games. }
}

Endnote

%0 Conference Paper
%T  Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games 
%A Yulai Zhao
%A Yuandong Tian
%A Jason Lee
%A Simon Du
%B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2022
%E Gustau Camps-Valls
%E Francisco J. R. Ruiz
%E Isabel Valera	
%F pmlr-v151-zhao22b
%I PMLR
%P 2736--2761
%U https://proceedings.mlr.press/v151/zhao22b.html
%V 151
%X  Policy-based methods with function approximation are widely used for solving two-player zero-sum games with large state and/or action spaces. However, it remains elusive how to obtain optimization and statistical guarantees for such algorithms. We present a new policy optimization algorithm with function approximation and prove that under standard regularity conditions on the Markov game and the function approximation class, our algorithm finds a near-optimal policy within a polynomial number of samples and iterations. To our knowledge, this is the first provably efficient policy optimization algorithm with function approximation that solves two-player zero-sum Markov games.

APA


Zhao, Y., Tian, Y., Lee, J. & Du, S.. (2022).  Provably Efficient Policy Optimization for Two-Player Zero-Sum Markov Games . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:2736-2761 Available from https://proceedings.mlr.press/v151/zhao22b.html.

Related Material

Download PDF