Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

Julien Perolat; Bruno Scherrer; Bilal Piot; Olivier Pietquin

Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

Julien Perolat, Bruno Scherrer, Bilal Piot, Olivier Pietquin

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1321-1329, 2015.

Abstract

This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L_p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteration). We show that we can achieve a stationary policy which is \frac2γ(1 - γ)^2 ε+ \frac1(1 - γ)^2ε’-optimal, where εis the value function approximation error and ε’ is the approximate greedy operator error. In addition, we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum stochastic games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes in a batch setting) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-perolat15,
  title = 	 {Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games},
  author = 	 {Perolat, Julien and Scherrer, Bruno and Piot, Bilal and Pietquin, Olivier},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1321--1329},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/perolat15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/perolat15.html},
  abstract = 	 {This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L_p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteration). We show that we can achieve a stationary policy which is \frac2γ(1 - γ)^2 ε+ \frac1(1 - γ)^2ε’-optimal, where εis the value function approximation error and ε’ is the approximate greedy operator error. In addition, we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum stochastic games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes in a batch setting) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.}
}

Endnote

%0 Conference Paper
%T Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games
%A Julien Perolat
%A Bruno Scherrer
%A Bilal Piot
%A Olivier Pietquin
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-perolat15
%I PMLR
%P 1321--1329
%U https://proceedings.mlr.press/v37/perolat15.html
%V 37
%X This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L_p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteration). We show that we can achieve a stationary policy which is \frac2γ(1 - γ)^2 ε+ \frac1(1 - γ)^2ε’-optimal, where εis the value function approximation error and ε’ is the approximate greedy operator error. In addition, we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum stochastic games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes in a batch setting) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.

RIS


TY  - CPAPER
TI  - Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games
AU  - Julien Perolat
AU  - Bruno Scherrer
AU  - Bilal Piot
AU  - Olivier Pietquin
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-perolat15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1321
EP  - 1329
L1  - http://proceedings.mlr.press/v37/perolat15.pdf
UR  - https://proceedings.mlr.press/v37/perolat15.html
AB  - This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L_p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteration). We show that we can achieve a stationary policy which is \frac2γ(1 - γ)^2 ε+ \frac1(1 - γ)^2ε’-optimal, where εis the value function approximation error and ε’ is the approximate greedy operator error. In addition, we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum stochastic games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes in a batch setting) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.
ER  -

APA


Perolat, J., Scherrer, B., Piot, B. & Pietquin, O.. (2015). Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1321-1329 Available from https://proceedings.mlr.press/v37/perolat15.html.

Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

Abstract

Cite this Paper

Related Material