Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

Aaron Sidford; Mengdi Wang; Lin Yang; Yinyu Ye

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

Aaron Sidford, Mengdi Wang, Lin Yang, Yinyu Ye

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2992-3002, 2020.

Abstract

In this paper we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors. Given a stochastic game with discount factor

$\gamma\in(0,1)$ we provide an algorithm that computes an

$\epsilon$ -optimal strategy with high-probability given

$\tilde{O}((1 - \gamma)^{-3} \epsilon^{-2})$ samples from the transition function for each state-action-pair. Our algorithm runs in time nearly linear in the number of samples and uses space nearly linear in the number of state-action pairs. As stochastic games generalize Markov decision processes (MDPs) our runtime and sample complexities are optimal due to \cite{azar2013minimax}. We achieve our results by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular \cite{sidford2018near}, to two-player strategy computation algorithms. This overcomes limitations of standard Q-learning and strategy iteration or alternating minimization based approaches and we hope will pave the way for future reinforcement learning results by facilitating the extension of MDP results to multi-agent settings with little loss.

Cite this Paper

BibTeX


@InProceedings{pmlr-v108-sidford20a,
  title = 	 {Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity},
  author =       {Sidford, Aaron and Wang, Mengdi and Yang, Lin and Ye, Yinyu},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {2992--3002},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/sidford20a/sidford20a.pdf},
  url = 	 {https://proceedings.mlr.press/v108/sidford20a.html},
  abstract = 	 {In this paper we settle  the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors. Given a stochastic game with discount factor $\gamma\in(0,1)$ we provide an algorithm that computes an $\epsilon$-optimal strategy with high-probability given $\tilde{O}((1 - \gamma)^{-3} \epsilon^{-2})$ samples from the transition function for each state-action-pair. Our algorithm runs in time nearly linear in the number of samples and uses space nearly linear in the number of state-action pairs. As stochastic games generalize Markov decision processes (MDPs) our runtime and sample complexities are optimal due to \cite{azar2013minimax}. We achieve our results by showing how to generalize a near-optimal Q-learning based algorithms for MDP,  in particular \cite{sidford2018near},  to two-player strategy computation algorithms. This overcomes limitations of standard Q-learning and strategy iteration or alternating minimization based approaches and we hope will pave the way for future reinforcement learning results by facilitating the extension of MDP results to multi-agent settings with little loss.}
}

Endnote

%0 Conference Paper
%T Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity
%A Aaron Sidford
%A Mengdi Wang
%A Lin Yang
%A Yinyu Ye
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-sidford20a
%I PMLR
%P 2992--3002
%U https://proceedings.mlr.press/v108/sidford20a.html
%V 108
%X In this paper we settle  the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors. Given a stochastic game with discount factor $\gamma\in(0,1)$ we provide an algorithm that computes an $\epsilon$-optimal strategy with high-probability given $\tilde{O}((1 - \gamma)^{-3} \epsilon^{-2})$ samples from the transition function for each state-action-pair. Our algorithm runs in time nearly linear in the number of samples and uses space nearly linear in the number of state-action pairs. As stochastic games generalize Markov decision processes (MDPs) our runtime and sample complexities are optimal due to \cite{azar2013minimax}. We achieve our results by showing how to generalize a near-optimal Q-learning based algorithms for MDP,  in particular \cite{sidford2018near},  to two-player strategy computation algorithms. This overcomes limitations of standard Q-learning and strategy iteration or alternating minimization based approaches and we hope will pave the way for future reinforcement learning results by facilitating the extension of MDP results to multi-agent settings with little loss.

APA


Sidford, A., Wang, M., Yang, L. & Ye, Y.. (2020). Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2992-3002 Available from https://proceedings.mlr.press/v108/sidford20a.html.

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

Abstract

Cite this Paper

Related Material