Block Contextual MDPs for Continual Learning

Shagun Sodhani, Franziska Meier, Joelle Pineau, Amy Zhang
Proceedings of The 4th Annual Learning for Dynamics and Control Conference, PMLR 168:608-623, 2022.

Abstract

In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics are implicitly assumed to be stationary. This assumption of stationarity, while simplifying, can be unrealistic in many scenarios. In the continual reinforcement learning scenario, the sequence of tasks is another source of nonstationarity. In this work, we propose to examine this continual reinforcement learning setting through the Block Contextual MDP (BC-MDP) framework, which enables us to relax the assumption of stationarity. This framework challenges RL algorithms to handle both nonstationarity and rich observation settings and, by additionally leveraging smoothness properties, enables us to study generalization bounds for this setting. Finally, we take inspiration from adaptive control to propose a novel algorithm that addresses the challenges introduced by this more realistic BC-MDP setting, allows for zero-shot adaptation at evaluation time, and achieves strong performance on several nonstationary environments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v168-sodhani22a, title = {Block Contextual MDPs for Continual Learning}, author = {Sodhani, Shagun and Meier, Franziska and Pineau, Joelle and Zhang, Amy}, booktitle = {Proceedings of The 4th Annual Learning for Dynamics and Control Conference}, pages = {608--623}, year = {2022}, editor = {Firoozi, Roya and Mehr, Negar and Yel, Esen and Antonova, Rika and Bohg, Jeannette and Schwager, Mac and Kochenderfer, Mykel}, volume = {168}, series = {Proceedings of Machine Learning Research}, month = {23--24 Jun}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v168/sodhani22a/sodhani22a.pdf}, url = {https://proceedings.mlr.press/v168/sodhani22a.html}, abstract = {In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics are implicitly assumed to be stationary. This assumption of stationarity, while simplifying, can be unrealistic in many scenarios. In the continual reinforcement learning scenario, the sequence of tasks is another source of nonstationarity. In this work, we propose to examine this continual reinforcement learning setting through the Block Contextual MDP (BC-MDP) framework, which enables us to relax the assumption of stationarity. This framework challenges RL algorithms to handle both nonstationarity and rich observation settings and, by additionally leveraging smoothness properties, enables us to study generalization bounds for this setting. Finally, we take inspiration from adaptive control to propose a novel algorithm that addresses the challenges introduced by this more realistic BC-MDP setting, allows for zero-shot adaptation at evaluation time, and achieves strong performance on several nonstationary environments.} }
Endnote
%0 Conference Paper %T Block Contextual MDPs for Continual Learning %A Shagun Sodhani %A Franziska Meier %A Joelle Pineau %A Amy Zhang %B Proceedings of The 4th Annual Learning for Dynamics and Control Conference %C Proceedings of Machine Learning Research %D 2022 %E Roya Firoozi %E Negar Mehr %E Esen Yel %E Rika Antonova %E Jeannette Bohg %E Mac Schwager %E Mykel Kochenderfer %F pmlr-v168-sodhani22a %I PMLR %P 608--623 %U https://proceedings.mlr.press/v168/sodhani22a.html %V 168 %X In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics are implicitly assumed to be stationary. This assumption of stationarity, while simplifying, can be unrealistic in many scenarios. In the continual reinforcement learning scenario, the sequence of tasks is another source of nonstationarity. In this work, we propose to examine this continual reinforcement learning setting through the Block Contextual MDP (BC-MDP) framework, which enables us to relax the assumption of stationarity. This framework challenges RL algorithms to handle both nonstationarity and rich observation settings and, by additionally leveraging smoothness properties, enables us to study generalization bounds for this setting. Finally, we take inspiration from adaptive control to propose a novel algorithm that addresses the challenges introduced by this more realistic BC-MDP setting, allows for zero-shot adaptation at evaluation time, and achieves strong performance on several nonstationary environments.
APA
Sodhani, S., Meier, F., Pineau, J. & Zhang, A.. (2022). Block Contextual MDPs for Continual Learning. Proceedings of The 4th Annual Learning for Dynamics and Control Conference, in Proceedings of Machine Learning Research 168:608-623 Available from https://proceedings.mlr.press/v168/sodhani22a.html.

Related Material