A Fast and Reliable Policy Improvement Algorithm

Yasin Abbasi-Yadkori; Peter L. Bartlett; Stephen J. Wright

A Fast and Reliable Policy Improvement Algorithm

Yasin Abbasi-Yadkori, Peter L. Bartlett, Stephen J. Wright

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1338-1346, 2016.

Abstract

We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-abbasi-yadkori16,
  title = 	 {A Fast and Reliable Policy Improvement Algorithm},
  author = 	 {Abbasi-Yadkori, Yasin and Bartlett, Peter L. and Wright, Stephen J.},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1338--1346},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/abbasi-yadkori16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/abbasi-yadkori16.html},
  abstract = 	 {We introduce a simple, efficient method that improves stochastic policies for Markov decision processes.  The computational complexity is the same as that of the value estimation problem.  We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics.  Performance in numerical experiments compares favorably with previous policy improvement algorithms.}
}

Endnote

%0 Conference Paper
%T A Fast and Reliable Policy Improvement Algorithm
%A Yasin Abbasi-Yadkori
%A Peter L. Bartlett
%A Stephen J. Wright
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-abbasi-yadkori16
%I PMLR
%P 1338--1346
%U https://proceedings.mlr.press/v51/abbasi-yadkori16.html
%V 51
%X We introduce a simple, efficient method that improves stochastic policies for Markov decision processes.  The computational complexity is the same as that of the value estimation problem.  We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics.  Performance in numerical experiments compares favorably with previous policy improvement algorithms.

RIS


TY  - CPAPER
TI  - A Fast and Reliable Policy Improvement Algorithm
AU  - Yasin Abbasi-Yadkori
AU  - Peter L. Bartlett
AU  - Stephen J. Wright
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-abbasi-yadkori16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 1338
EP  - 1346
L1  - http://proceedings.mlr.press/v51/abbasi-yadkori16.pdf
UR  - https://proceedings.mlr.press/v51/abbasi-yadkori16.html
AB  - We introduce a simple, efficient method that improves stochastic policies for Markov decision processes.  The computational complexity is the same as that of the value estimation problem.  We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics.  Performance in numerical experiments compares favorably with previous policy improvement algorithms.
ER  -

APA


Abbasi-Yadkori, Y., Bartlett, P.L. & Wright, S.J.. (2016). A Fast and Reliable Policy Improvement Algorithm. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:1338-1346 Available from https://proceedings.mlr.press/v51/abbasi-yadkori16.html.

A Fast and Reliable Policy Improvement Algorithm

Abstract

Cite this Paper

Related Material