Policy and Value Transfer in Lifelong Reinforcement Learning

David Abel, Yuu Jinnai, Sophie Yue Guo, George Konidaris, Michael Littman
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:20-29, 2018.

Abstract

We consider the problem of how best to use prior experience to bootstrap lifelong learning, where an agent faces a series of task instances drawn from some task distribution. First, we identify the initial policy that optimizes expected performance over the distribution of tasks for increasingly complex classes of policy and task distributions. We empirically demonstrate the relative performance of each policy class’ optimal element in a variety of simple task distributions. We then consider value-function initialization methods that preserve PAC guarantees while simultaneously minimizing the learning required in two learning algorithms, yielding MaxQInit, a practical new method for value-function-based transfer. We show that MaxQInit performs well in simple lifelong RL experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-abel18b, title = {Policy and Value Transfer in Lifelong Reinforcement Learning}, author = {Abel, David and Jinnai, Yuu and Guo, Sophie Yue and Konidaris, George and Littman, Michael}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {20--29}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/abel18b/abel18b.pdf}, url = {https://proceedings.mlr.press/v80/abel18b.html}, abstract = {We consider the problem of how best to use prior experience to bootstrap lifelong learning, where an agent faces a series of task instances drawn from some task distribution. First, we identify the initial policy that optimizes expected performance over the distribution of tasks for increasingly complex classes of policy and task distributions. We empirically demonstrate the relative performance of each policy class’ optimal element in a variety of simple task distributions. We then consider value-function initialization methods that preserve PAC guarantees while simultaneously minimizing the learning required in two learning algorithms, yielding MaxQInit, a practical new method for value-function-based transfer. We show that MaxQInit performs well in simple lifelong RL experiments.} }
Endnote
%0 Conference Paper %T Policy and Value Transfer in Lifelong Reinforcement Learning %A David Abel %A Yuu Jinnai %A Sophie Yue Guo %A George Konidaris %A Michael Littman %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-abel18b %I PMLR %P 20--29 %U https://proceedings.mlr.press/v80/abel18b.html %V 80 %X We consider the problem of how best to use prior experience to bootstrap lifelong learning, where an agent faces a series of task instances drawn from some task distribution. First, we identify the initial policy that optimizes expected performance over the distribution of tasks for increasingly complex classes of policy and task distributions. We empirically demonstrate the relative performance of each policy class’ optimal element in a variety of simple task distributions. We then consider value-function initialization methods that preserve PAC guarantees while simultaneously minimizing the learning required in two learning algorithms, yielding MaxQInit, a practical new method for value-function-based transfer. We show that MaxQInit performs well in simple lifelong RL experiments.
APA
Abel, D., Jinnai, Y., Guo, S.Y., Konidaris, G. & Littman, M.. (2018). Policy and Value Transfer in Lifelong Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:20-29 Available from https://proceedings.mlr.press/v80/abel18b.html.

Related Material