[edit]
Exploration Analysis in Finite-Horizon Turn-based Stochastic Games
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:201-210, 2020.
Abstract
Exploration and exploitation trade-off is one of the key concerns in reinforcement learning. Previous work on one-player Markov Decision Processes has reached near-optimal results for both PAC and high probability regret guarantees. However, such an analysis is lacking for the more complex stochastic games with multi-players, where all players aim to find an approximate Nash Equilibrium. In this work, we address the exploration issue for the N-player finite-horizon turn-based stochastic games (FTSG). We propose a framework, \textit{Upper Bounding the Values for Players} (UBVP), to guide exploration in FTSGs. UBVP leverages the key insight that players choose the optimal policy conditioning on the policies of the others simultaneously; thus players can explore \textit{in the face of uncertainty} and get close to the Nash Equilibrium. Based on UBVP, we present two provable algorithms. One is \textit{Uniform}-PAC with a sample complexity of ˜O(1/ϵ2) to get an ϵ-Nash Equilibrium for arbitrary ϵ>0, and the other has a cumulative exploitability of ˜O(√T) with high probability.