Minimax sample complexity for turn-based stochastic game

Qiwen Cui, Lin F. Yang
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, PMLR 161:1496-1504, 2021.

Abstract

The empirical success of multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we perform planning in an empirical TBSG by utilizing a ‘simulator’ that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop reward perturbation techniques to tackle the non-stationarity in the game and Taylor-expansion-type analysis to improve the dependence on approximation error. With these novel techniques, we prove the minimax sample complexity of turn-based stochastic game.

Cite this Paper


BibTeX
@InProceedings{pmlr-v161-cui21a, title = {Minimax sample complexity for turn-based stochastic game}, author = {Cui, Qiwen and Yang, Lin F.}, booktitle = {Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence}, pages = {1496--1504}, year = {2021}, editor = {de Campos, Cassio and Maathuis, Marloes H.}, volume = {161}, series = {Proceedings of Machine Learning Research}, month = {27--30 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v161/cui21a/cui21a.pdf}, url = {https://proceedings.mlr.press/v161/cui21a.html}, abstract = {The empirical success of multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we perform planning in an empirical TBSG by utilizing a ‘simulator’ that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop reward perturbation techniques to tackle the non-stationarity in the game and Taylor-expansion-type analysis to improve the dependence on approximation error. With these novel techniques, we prove the minimax sample complexity of turn-based stochastic game.} }
Endnote
%0 Conference Paper %T Minimax sample complexity for turn-based stochastic game %A Qiwen Cui %A Lin F. Yang %B Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2021 %E Cassio de Campos %E Marloes H. Maathuis %F pmlr-v161-cui21a %I PMLR %P 1496--1504 %U https://proceedings.mlr.press/v161/cui21a.html %V 161 %X The empirical success of multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we perform planning in an empirical TBSG by utilizing a ‘simulator’ that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop reward perturbation techniques to tackle the non-stationarity in the game and Taylor-expansion-type analysis to improve the dependence on approximation error. With these novel techniques, we prove the minimax sample complexity of turn-based stochastic game.
APA
Cui, Q. & Yang, L.F.. (2021). Minimax sample complexity for turn-based stochastic game. Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 161:1496-1504 Available from https://proceedings.mlr.press/v161/cui21a.html.

Related Material