Exploration Analysis in Finite-Horizon Turn-based Stochastic Games

Jialian Li; Yichi Zhou; Tongzheng Ren; Jun Zhu

Exploration Analysis in Finite-Horizon Turn-based Stochastic Games

Jialian Li, Yichi Zhou, Tongzheng Ren, Jun Zhu

Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:201-210, 2020.

Abstract

Exploration and exploitation trade-off is one of the key concerns in reinforcement learning. Previous work on one-player Markov Decision Processes has reached near-optimal results for both PAC and high probability regret guarantees. However, such an analysis is lacking for the more complex stochastic games with multi-players, where all players aim to find an approximate Nash Equilibrium. In this work, we address the exploration issue for the $N$-player finite-horizon turn-based stochastic games (FTSG). We propose a framework, \textit{Upper Bounding the Values for Players} (UBVP), to guide exploration in FTSGs. UBVP leverages the key insight that players choose the optimal policy conditioning on the policies of the others simultaneously; thus players can explore \textit{in the face of uncertainty} and get close to the Nash Equilibrium. Based on UBVP, we present two provable algorithms. One is \textit{Uniform}-PAC with a sample complexity of $\tilde{O}(1/\epsilon^2)$ to get an $\epsilon$-Nash Equilibrium for arbitrary $\epsilon>0$, and the other has a cumulative exploitability of $\tilde{O}(\sqrt{T})$ with high probability.

Cite this Paper

BibTeX


@InProceedings{pmlr-v124-li20a,
  title = 	 {Exploration Analysis in Finite-Horizon Turn-based Stochastic Games},
  author =       {Li, Jialian and Zhou, Yichi and Ren, Tongzheng and Zhu, Jun},
  booktitle = 	 {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)},
  pages = 	 {201--210},
  year = 	 {2020},
  editor = 	 {Peters, Jonas and Sontag, David},
  volume = 	 {124},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {03--06 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v124/li20a/li20a.pdf},
  url = 	 {https://proceedings.mlr.press/v124/li20a.html},
  abstract = 	 {Exploration and exploitation trade-off is one of the key concerns in reinforcement learning. Previous work on one-player Markov Decision Processes has reached near-optimal results for both PAC and high probability regret guarantees. However, such an analysis is lacking for the more complex stochastic games with multi-players, where all players aim to find an approximate Nash Equilibrium. In this work, we address the exploration issue for the $N$-player finite-horizon turn-based stochastic games (FTSG). We propose a framework, \textit{Upper Bounding the Values for Players} (UBVP), to guide exploration in FTSGs. UBVP leverages the key insight that players choose the optimal policy conditioning on the policies of the others simultaneously; thus players can explore \textit{in the face of uncertainty} and get close to the Nash Equilibrium. Based on UBVP, we present two provable algorithms. One is \textit{Uniform}-PAC with a sample complexity of $\tilde{O}(1/\epsilon^2)$ to get an $\epsilon$-Nash Equilibrium for arbitrary $\epsilon>0$, and the other has a cumulative exploitability of $\tilde{O}(\sqrt{T})$ with high probability.}
}

Endnote

%0 Conference Paper
%T Exploration Analysis in Finite-Horizon Turn-based Stochastic Games
%A Jialian Li
%A Yichi Zhou
%A Tongzheng Ren
%A Jun Zhu
%B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)
%C Proceedings of Machine Learning Research
%D 2020
%E Jonas Peters
%E David Sontag	
%F pmlr-v124-li20a
%I PMLR
%P 201--210
%U https://proceedings.mlr.press/v124/li20a.html
%V 124
%X Exploration and exploitation trade-off is one of the key concerns in reinforcement learning. Previous work on one-player Markov Decision Processes has reached near-optimal results for both PAC and high probability regret guarantees. However, such an analysis is lacking for the more complex stochastic games with multi-players, where all players aim to find an approximate Nash Equilibrium. In this work, we address the exploration issue for the $N$-player finite-horizon turn-based stochastic games (FTSG). We propose a framework, \textit{Upper Bounding the Values for Players} (UBVP), to guide exploration in FTSGs. UBVP leverages the key insight that players choose the optimal policy conditioning on the policies of the others simultaneously; thus players can explore \textit{in the face of uncertainty} and get close to the Nash Equilibrium. Based on UBVP, we present two provable algorithms. One is \textit{Uniform}-PAC with a sample complexity of $\tilde{O}(1/\epsilon^2)$ to get an $\epsilon$-Nash Equilibrium for arbitrary $\epsilon>0$, and the other has a cumulative exploitability of $\tilde{O}(\sqrt{T})$ with high probability.

APA


Li, J., Zhou, Y., Ren, T. & Zhu, J.. (2020). Exploration Analysis in Finite-Horizon Turn-based Stochastic Games. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:201-210 Available from https://proceedings.mlr.press/v124/li20a.html.

Exploration Analysis in Finite-Horizon Turn-based Stochastic Games

Abstract

Cite this Paper

Related Material