Deep Reinforcement Learning amidst Continual Structured Non-Stationarity

Annie Xie, James Harrison, Chelsea Finn
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:11393-11403, 2021.

Abstract

As humans, our goals and our environment are persistently changing throughout our lifetime based on our experiences, actions, and internal and external drives. In contrast, typical reinforcement learning problem set-ups consider decision processes that are stationary across episodes. Can we develop reinforcement learning algorithms that can cope with the persistent change in the former, more realistic problem settings? While on-policy algorithms such as policy gradients in principle can be extended to non-stationary settings, the same cannot be said for more efficient off-policy algorithms that replay past experiences when learning. In this work, we formalize this problem setting, and draw upon ideas from the online learning and probabilistic inference literature to derive an off-policy RL algorithm that can reason about and tackle such lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences, and performs off-policy RL with this representation. We further introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-xie21c, title = {Deep Reinforcement Learning amidst Continual Structured Non-Stationarity}, author = {Xie, Annie and Harrison, James and Finn, Chelsea}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {11393--11403}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/xie21c/xie21c.pdf}, url = {https://proceedings.mlr.press/v139/xie21c.html}, abstract = {As humans, our goals and our environment are persistently changing throughout our lifetime based on our experiences, actions, and internal and external drives. In contrast, typical reinforcement learning problem set-ups consider decision processes that are stationary across episodes. Can we develop reinforcement learning algorithms that can cope with the persistent change in the former, more realistic problem settings? While on-policy algorithms such as policy gradients in principle can be extended to non-stationary settings, the same cannot be said for more efficient off-policy algorithms that replay past experiences when learning. In this work, we formalize this problem setting, and draw upon ideas from the online learning and probabilistic inference literature to derive an off-policy RL algorithm that can reason about and tackle such lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences, and performs off-policy RL with this representation. We further introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.} }
Endnote
%0 Conference Paper %T Deep Reinforcement Learning amidst Continual Structured Non-Stationarity %A Annie Xie %A James Harrison %A Chelsea Finn %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-xie21c %I PMLR %P 11393--11403 %U https://proceedings.mlr.press/v139/xie21c.html %V 139 %X As humans, our goals and our environment are persistently changing throughout our lifetime based on our experiences, actions, and internal and external drives. In contrast, typical reinforcement learning problem set-ups consider decision processes that are stationary across episodes. Can we develop reinforcement learning algorithms that can cope with the persistent change in the former, more realistic problem settings? While on-policy algorithms such as policy gradients in principle can be extended to non-stationary settings, the same cannot be said for more efficient off-policy algorithms that replay past experiences when learning. In this work, we formalize this problem setting, and draw upon ideas from the online learning and probabilistic inference literature to derive an off-policy RL algorithm that can reason about and tackle such lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences, and performs off-policy RL with this representation. We further introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
APA
Xie, A., Harrison, J. & Finn, C.. (2021). Deep Reinforcement Learning amidst Continual Structured Non-Stationarity. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:11393-11403 Available from https://proceedings.mlr.press/v139/xie21c.html.

Related Material