Beyond the One-Step Greedy Approach in Reinforcement Learning

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1387-1396, 2018.

Abstract

The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, n-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-efroni18a, title = {Beyond the One-Step Greedy Approach in Reinforcement Learning}, author = {Efroni, Yonathan and Dalal, Gal and Scherrer, Bruno and Mannor, Shie}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {1387--1396}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/efroni18a/efroni18a.pdf}, url = {https://proceedings.mlr.press/v80/efroni18a.html}, abstract = {The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, n-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.} }
Endnote
%0 Conference Paper %T Beyond the One-Step Greedy Approach in Reinforcement Learning %A Yonathan Efroni %A Gal Dalal %A Bruno Scherrer %A Shie Mannor %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-efroni18a %I PMLR %P 1387--1396 %U https://proceedings.mlr.press/v80/efroni18a.html %V 80 %X The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, n-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.
APA
Efroni, Y., Dalal, G., Scherrer, B. & Mannor, S.. (2018). Beyond the One-Step Greedy Approach in Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:1387-1396 Available from https://proceedings.mlr.press/v80/efroni18a.html.

Related Material