Softened Approximate Policy Iteration for Markov Games

Julien Pérolat; Bilal Piot; Matthieu Geist; Bruno Scherrer; Olivier Pietquin

Softened Approximate Policy Iteration for Markov Games

Julien Pérolat, Bilal Piot, Matthieu Geist, Bruno Scherrer, Olivier Pietquin

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1860-1868, 2016.

Abstract

This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of Newton’s method to different norms of the OBR. More precisely, when applied to the norm of the OBR, Newton’s method results in the Bellman Residual Minimization Policy Iteration (BRMPI) and, when applied to the norm of the Projected OBR (POBR), it results into the standard Least Squares Policy Iteration (LSPI) algorithm. Consequently, new algorithms are proposed, making use of quasi-Newton methods to minimize the OBR and the POBR so as to take benefit of enhanced empirical performances at low cost. Indeed, using a quasi-Newton method approach introduces slight modifications in term of coding of LSPI and BRMPI but improves significantly both the stability and the performance of those algorithms. These phenomena are illustrated on an experiment conducted on artificially constructed games called Garnets.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-perolat16,
  title = 	 {Softened Approximate Policy Iteration for Markov Games},
  author = 	 {Pérolat, Julien and Piot, Bilal and Geist, Matthieu and Scherrer, Bruno and Pietquin, Olivier},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {1860--1868},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/perolat16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/perolat16.html},
  abstract = 	 {This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of Newton’s method to different norms of the OBR. More precisely, when applied to the norm of the OBR, Newton’s method results in the Bellman Residual Minimization Policy Iteration (BRMPI) and, when applied to the norm of the Projected OBR (POBR), it results into the standard Least Squares Policy Iteration (LSPI) algorithm. Consequently, new algorithms are proposed, making use of quasi-Newton methods to minimize the OBR and the POBR so as to take benefit of enhanced empirical performances at low cost. Indeed, using a quasi-Newton method approach introduces slight modifications in term of coding of LSPI and BRMPI but improves significantly both the stability and the performance of those algorithms. These phenomena are illustrated on an experiment conducted on artificially constructed games called Garnets.}
}

Endnote

%0 Conference Paper
%T Softened Approximate Policy Iteration for Markov Games
%A Julien Pérolat
%A Bilal Piot
%A Matthieu Geist
%A Bruno Scherrer
%A Olivier Pietquin
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-perolat16
%I PMLR
%P 1860--1868
%U https://proceedings.mlr.press/v48/perolat16.html
%V 48
%X This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of Newton’s method to different norms of the OBR. More precisely, when applied to the norm of the OBR, Newton’s method results in the Bellman Residual Minimization Policy Iteration (BRMPI) and, when applied to the norm of the Projected OBR (POBR), it results into the standard Least Squares Policy Iteration (LSPI) algorithm. Consequently, new algorithms are proposed, making use of quasi-Newton methods to minimize the OBR and the POBR so as to take benefit of enhanced empirical performances at low cost. Indeed, using a quasi-Newton method approach introduces slight modifications in term of coding of LSPI and BRMPI but improves significantly both the stability and the performance of those algorithms. These phenomena are illustrated on an experiment conducted on artificially constructed games called Garnets.

RIS


TY  - CPAPER
TI  - Softened Approximate Policy Iteration for Markov Games
AU  - Julien Pérolat
AU  - Bilal Piot
AU  - Matthieu Geist
AU  - Bruno Scherrer
AU  - Olivier Pietquin
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-perolat16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 1860
EP  - 1868
L1  - http://proceedings.mlr.press/v48/perolat16.pdf
UR  - https://proceedings.mlr.press/v48/perolat16.html
AB  - This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of Newton’s method to different norms of the OBR. More precisely, when applied to the norm of the OBR, Newton’s method results in the Bellman Residual Minimization Policy Iteration (BRMPI) and, when applied to the norm of the Projected OBR (POBR), it results into the standard Least Squares Policy Iteration (LSPI) algorithm. Consequently, new algorithms are proposed, making use of quasi-Newton methods to minimize the OBR and the POBR so as to take benefit of enhanced empirical performances at low cost. Indeed, using a quasi-Newton method approach introduces slight modifications in term of coding of LSPI and BRMPI but improves significantly both the stability and the performance of those algorithms. These phenomena are illustrated on an experiment conducted on artificially constructed games called Garnets.
ER  -

APA


Pérolat, J., Piot, B., Geist, M., Scherrer, B. & Pietquin, O.. (2016). Softened Approximate Policy Iteration for Markov Games. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1860-1868 Available from https://proceedings.mlr.press/v48/perolat16.html.

Softened Approximate Policy Iteration for Markov Games

Abstract

Cite this Paper

Related Material