Policy Consolidation for Continual Reinforcement Learning

Christos Kaplanis, Murray Shanahan, Claudia Clopath
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:3242-3251, 2019.

Abstract

We propose a method for tackling catastrophic forgetting in deep reinforcement learning that is agnostic to the timescale of changes in the distribution of experiences, does not require knowledge of task boundaries and can adapt in continuously changing environments. In our policy consolidation model, the policy network interacts with a cascade of hidden networks that simultaneously remember the agent’s policy at a range of timescales and regularise the current policy by its own history, thereby improving its ability to learn without forgetting. We find that the model improves continual learning relative to baselines on a number of continuous control tasks in single-task, alternating two-task, and multi-agent competitive self-play settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-kaplanis19a, title = {Policy Consolidation for Continual Reinforcement Learning}, author = {Kaplanis, Christos and Shanahan, Murray and Clopath, Claudia}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {3242--3251}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/kaplanis19a/kaplanis19a.pdf}, url = {https://proceedings.mlr.press/v97/kaplanis19a.html}, abstract = {We propose a method for tackling catastrophic forgetting in deep reinforcement learning that is agnostic to the timescale of changes in the distribution of experiences, does not require knowledge of task boundaries and can adapt in continuously changing environments. In our policy consolidation model, the policy network interacts with a cascade of hidden networks that simultaneously remember the agent’s policy at a range of timescales and regularise the current policy by its own history, thereby improving its ability to learn without forgetting. We find that the model improves continual learning relative to baselines on a number of continuous control tasks in single-task, alternating two-task, and multi-agent competitive self-play settings.} }
Endnote
%0 Conference Paper %T Policy Consolidation for Continual Reinforcement Learning %A Christos Kaplanis %A Murray Shanahan %A Claudia Clopath %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-kaplanis19a %I PMLR %P 3242--3251 %U https://proceedings.mlr.press/v97/kaplanis19a.html %V 97 %X We propose a method for tackling catastrophic forgetting in deep reinforcement learning that is agnostic to the timescale of changes in the distribution of experiences, does not require knowledge of task boundaries and can adapt in continuously changing environments. In our policy consolidation model, the policy network interacts with a cascade of hidden networks that simultaneously remember the agent’s policy at a range of timescales and regularise the current policy by its own history, thereby improving its ability to learn without forgetting. We find that the model improves continual learning relative to baselines on a number of continuous control tasks in single-task, alternating two-task, and multi-agent competitive self-play settings.
APA
Kaplanis, C., Shanahan, M. & Clopath, C.. (2019). Policy Consolidation for Continual Reinforcement Learning. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:3242-3251 Available from https://proceedings.mlr.press/v97/kaplanis19a.html.

Related Material