A Fast and Reliable Policy Improvement Algorithm

Yasin Abbasi-Yadkori, Peter L. Bartlett, Stephen J. Wright
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1338-1346, 2016.

Abstract

We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-abbasi-yadkori16, title = {A Fast and Reliable Policy Improvement Algorithm}, author = {Abbasi-Yadkori, Yasin and Bartlett, Peter L. and Wright, Stephen J.}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {1338--1346}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/abbasi-yadkori16.pdf}, url = {https://proceedings.mlr.press/v51/abbasi-yadkori16.html}, abstract = {We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms.} }
Endnote
%0 Conference Paper %T A Fast and Reliable Policy Improvement Algorithm %A Yasin Abbasi-Yadkori %A Peter L. Bartlett %A Stephen J. Wright %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-abbasi-yadkori16 %I PMLR %P 1338--1346 %U https://proceedings.mlr.press/v51/abbasi-yadkori16.html %V 51 %X We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms.
RIS
TY - CPAPER TI - A Fast and Reliable Policy Improvement Algorithm AU - Yasin Abbasi-Yadkori AU - Peter L. Bartlett AU - Stephen J. Wright BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-abbasi-yadkori16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 1338 EP - 1346 L1 - http://proceedings.mlr.press/v51/abbasi-yadkori16.pdf UR - https://proceedings.mlr.press/v51/abbasi-yadkori16.html AB - We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms. ER -
APA
Abbasi-Yadkori, Y., Bartlett, P.L. & Wright, S.J.. (2016). A Fast and Reliable Policy Improvement Algorithm. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:1338-1346 Available from https://proceedings.mlr.press/v51/abbasi-yadkori16.html.

Related Material