An Inference-Based Policy Gradient Method for Learning Options

Matthew Smith, Herke Hoof, Joelle Pineau
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4703-4712, 2018.

Abstract

In the pursuit of increasingly intelligent learning systems, abstraction plays a vital role in enabling sophisticated decisions to be made in complex environments. The options framework provides formalism for such abstraction over sequences of decisions. However most models require that options be given a priori, presumably specified by hand, which is neither efficient, nor scalable. Indeed, it is preferable to learn options directly from interaction with the environment. Despite several efforts, this remains a difficult problem. In this work we develop a novel policy gradient method for the automatic learning of policies with options. This algorithm uses inference methods to simultaneously improve all of the options available to an agent, and thus can be employed in an off-policy manner, without observing option labels. The differentiable inference procedure employed yields options that can be easily interpreted. Empirical results confirm these attributes, and indicate that our algorithm has an improved sample efficiency relative to state-of-the-art in learning options end-to-end.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-smith18a, title = {An Inference-Based Policy Gradient Method for Learning Options}, author = {Smith, Matthew and van Hoof, Herke and Pineau, Joelle}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {4703--4712}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/smith18a/smith18a.pdf}, url = {https://proceedings.mlr.press/v80/smith18a.html}, abstract = {In the pursuit of increasingly intelligent learning systems, abstraction plays a vital role in enabling sophisticated decisions to be made in complex environments. The options framework provides formalism for such abstraction over sequences of decisions. However most models require that options be given a priori, presumably specified by hand, which is neither efficient, nor scalable. Indeed, it is preferable to learn options directly from interaction with the environment. Despite several efforts, this remains a difficult problem. In this work we develop a novel policy gradient method for the automatic learning of policies with options. This algorithm uses inference methods to simultaneously improve all of the options available to an agent, and thus can be employed in an off-policy manner, without observing option labels. The differentiable inference procedure employed yields options that can be easily interpreted. Empirical results confirm these attributes, and indicate that our algorithm has an improved sample efficiency relative to state-of-the-art in learning options end-to-end.} }
Endnote
%0 Conference Paper %T An Inference-Based Policy Gradient Method for Learning Options %A Matthew Smith %A Herke Hoof %A Joelle Pineau %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-smith18a %I PMLR %P 4703--4712 %U https://proceedings.mlr.press/v80/smith18a.html %V 80 %X In the pursuit of increasingly intelligent learning systems, abstraction plays a vital role in enabling sophisticated decisions to be made in complex environments. The options framework provides formalism for such abstraction over sequences of decisions. However most models require that options be given a priori, presumably specified by hand, which is neither efficient, nor scalable. Indeed, it is preferable to learn options directly from interaction with the environment. Despite several efforts, this remains a difficult problem. In this work we develop a novel policy gradient method for the automatic learning of policies with options. This algorithm uses inference methods to simultaneously improve all of the options available to an agent, and thus can be employed in an off-policy manner, without observing option labels. The differentiable inference procedure employed yields options that can be easily interpreted. Empirical results confirm these attributes, and indicate that our algorithm has an improved sample efficiency relative to state-of-the-art in learning options end-to-end.
APA
Smith, M., Hoof, H. & Pineau, J.. (2018). An Inference-Based Policy Gradient Method for Learning Options. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:4703-4712 Available from https://proceedings.mlr.press/v80/smith18a.html.

Related Material