The Effectiveness of World Models for Continual Reinforcement Learning

Samuel Kessler, Mateusz Ostaszewski, MichałPaweł Bortkiewicz, Mateusz Żarski, Maciej Wolczyk, Jack Parker-Holder, Stephen J. Roberts, Piotr Mi\loś
Proceedings of The 2nd Conference on Lifelong Learning Agents, PMLR 232:184-204, 2023.

Abstract

World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning – a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay methods affect performance, forgetting, and transfer. We also provide recommendations regarding various modeling options for using world models. The best set of choices is called Continual-Dreamer, it is task-agnostic and utilizes the world model for continual exploration. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v232-kessler23a, title = {The Effectiveness of World Models for Continual Reinforcement Learning}, author = {Kessler, Samuel and Ostaszewski, Mateusz and Bortkiewicz, Micha\l Pawe\l and \.{Z}arski, Mateusz and Wolczyk, Maciej and Parker-Holder, Jack and Roberts, Stephen J. and Mi\lo\'s, Piotr}, booktitle = {Proceedings of The 2nd Conference on Lifelong Learning Agents}, pages = {184--204}, year = {2023}, editor = {Chandar, Sarath and Pascanu, Razvan and Sedghi, Hanie and Precup, Doina}, volume = {232}, series = {Proceedings of Machine Learning Research}, month = {22--25 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v232/kessler23a/kessler23a.pdf}, url = {https://proceedings.mlr.press/v232/kessler23a.html}, abstract = {World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning – a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay methods affect performance, forgetting, and transfer. We also provide recommendations regarding various modeling options for using world models. The best set of choices is called Continual-Dreamer, it is task-agnostic and utilizes the world model for continual exploration. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.} }
Endnote
%0 Conference Paper %T The Effectiveness of World Models for Continual Reinforcement Learning %A Samuel Kessler %A Mateusz Ostaszewski %A MichałPaweł Bortkiewicz %A Mateusz Żarski %A Maciej Wolczyk %A Jack Parker-Holder %A Stephen J. Roberts %A Piotr Mi\loś %B Proceedings of The 2nd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2023 %E Sarath Chandar %E Razvan Pascanu %E Hanie Sedghi %E Doina Precup %F pmlr-v232-kessler23a %I PMLR %P 184--204 %U https://proceedings.mlr.press/v232/kessler23a.html %V 232 %X World models power some of the most efficient reinforcement learning algorithms. In this work, we showcase that they can be harnessed for continual learning – a situation when the agent faces changing environments. World models typically employ a replay buffer for training, which can be naturally extended to continual learning. We systematically study how different selective experience replay methods affect performance, forgetting, and transfer. We also provide recommendations regarding various modeling options for using world models. The best set of choices is called Continual-Dreamer, it is task-agnostic and utilizes the world model for continual exploration. Continual-Dreamer is sample efficient and outperforms state-of-the-art task-agnostic continual reinforcement learning methods on Minigrid and Minihack benchmarks.
APA
Kessler, S., Ostaszewski, M., Bortkiewicz, M., Żarski, M., Wolczyk, M., Parker-Holder, J., Roberts, S.J. & Mi\loś, P.. (2023). The Effectiveness of World Models for Continual Reinforcement Learning. Proceedings of The 2nd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 232:184-204 Available from https://proceedings.mlr.press/v232/kessler23a.html.

Related Material