On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games

Julien Pérolat; Bilal Piot; Bruno Scherrer; Olivier Pietquin

On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games

Julien Pérolat, Bilal Piot, Bruno Scherrer, Olivier Pietquin

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:893-901, 2016.

Abstract

The main contribution of this paper consists in extending several non-stationary Reinforcement Learning (RL) algorithms and their theoretical guarantees to the case of γ-discounted zero-sum Markov Games (MGs). As in the case of Markov Decision Processes (MDPs), non-stationary algorithms are shown to exhibit better performance bounds compared to their stationary counterparts. The obtained bounds are generically composed of three terms: 1) a dependency on γ(discount factor), 2) a concentrability coefficient and 3) a propagation error term. This error, depending on the algorithm, can be caused by a regression step, a policy evaluation step or a best-response evaluation step. As a second contribution, we empirically demonstrate, on generic MGs (called Garnets), that non-stationary algorithms outperform their stationary counterparts. In addition, it is shown that their performance mostly depends on the nature of the propagation error. Indeed, algorithms where the error is due to the evaluation of a best-response are penalized (even if they exhibit better concentrability coefficients and dependencies on γ) compared to those suffering from a regression error.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-perolat16,
  title = 	 {On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games},
  author = 	 {Pérolat, Julien and Piot, Bilal and Scherrer, Bruno and Pietquin, Olivier},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {893--901},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/perolat16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/perolat16.html},
  abstract = 	 {The main contribution of this paper consists in extending several non-stationary Reinforcement Learning (RL) algorithms and their theoretical guarantees to the case of γ-discounted zero-sum Markov Games (MGs). As in the case of Markov Decision Processes (MDPs), non-stationary algorithms are shown to exhibit better performance bounds compared to their stationary counterparts. The obtained bounds are generically composed of three terms: 1) a dependency on γ(discount factor), 2) a concentrability coefficient and 3) a propagation error term. This error, depending on the algorithm, can be caused by a regression step, a policy evaluation step or a best-response evaluation step. As a second contribution, we empirically demonstrate, on generic MGs (called Garnets), that non-stationary algorithms outperform their stationary counterparts. In addition, it is shown that their performance mostly depends on the nature of the propagation error. Indeed, algorithms where the error is due to the evaluation of a best-response are penalized (even if they exhibit better concentrability coefficients and dependencies on γ) compared to those suffering from a regression error.}
}

Endnote

%0 Conference Paper
%T On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games
%A Julien Pérolat
%A Bilal Piot
%A Bruno Scherrer
%A Olivier Pietquin
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-perolat16
%I PMLR
%P 893--901
%U https://proceedings.mlr.press/v51/perolat16.html
%V 51
%X The main contribution of this paper consists in extending several non-stationary Reinforcement Learning (RL) algorithms and their theoretical guarantees to the case of γ-discounted zero-sum Markov Games (MGs). As in the case of Markov Decision Processes (MDPs), non-stationary algorithms are shown to exhibit better performance bounds compared to their stationary counterparts. The obtained bounds are generically composed of three terms: 1) a dependency on γ(discount factor), 2) a concentrability coefficient and 3) a propagation error term. This error, depending on the algorithm, can be caused by a regression step, a policy evaluation step or a best-response evaluation step. As a second contribution, we empirically demonstrate, on generic MGs (called Garnets), that non-stationary algorithms outperform their stationary counterparts. In addition, it is shown that their performance mostly depends on the nature of the propagation error. Indeed, algorithms where the error is due to the evaluation of a best-response are penalized (even if they exhibit better concentrability coefficients and dependencies on γ) compared to those suffering from a regression error.

RIS


TY  - CPAPER
TI  - On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games
AU  - Julien Pérolat
AU  - Bilal Piot
AU  - Bruno Scherrer
AU  - Olivier Pietquin
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-perolat16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 893
EP  - 901
L1  - http://proceedings.mlr.press/v51/perolat16.pdf
UR  - https://proceedings.mlr.press/v51/perolat16.html
AB  - The main contribution of this paper consists in extending several non-stationary Reinforcement Learning (RL) algorithms and their theoretical guarantees to the case of γ-discounted zero-sum Markov Games (MGs). As in the case of Markov Decision Processes (MDPs), non-stationary algorithms are shown to exhibit better performance bounds compared to their stationary counterparts. The obtained bounds are generically composed of three terms: 1) a dependency on γ(discount factor), 2) a concentrability coefficient and 3) a propagation error term. This error, depending on the algorithm, can be caused by a regression step, a policy evaluation step or a best-response evaluation step. As a second contribution, we empirically demonstrate, on generic MGs (called Garnets), that non-stationary algorithms outperform their stationary counterparts. In addition, it is shown that their performance mostly depends on the nature of the propagation error. Indeed, algorithms where the error is due to the evaluation of a best-response are penalized (even if they exhibit better concentrability coefficients and dependencies on γ) compared to those suffering from a regression error.
ER  -

APA


Pérolat, J., Piot, B., Scherrer, B. & Pietquin, O.. (2016). On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:893-901 Available from https://proceedings.mlr.press/v51/perolat16.html.

On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games

Abstract

Cite this Paper

Related Material