Reactive Exploration to Cope With Non-Stationarity in Lifelong Reinforcement Learning

Christian Alexander Steinparz, Thomas Schmied, Fabian Paischer, Marius-constantin Dinu, Vihang Prakash Patil, Angela Bitto-nemling, Hamid Eghbal-zadeh, Sepp Hochreiter
Proceedings of The 1st Conference on Lifelong Learning Agents, PMLR 199:441-469, 2022.

Abstract

In lifelong learning an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environmental dynamics. These non-stationarities, however, are difficult to detect and cope with due to their continuous nature. Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them. We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly. To this end, we conduct experiments in order to investigate different exploration strategies. We empirically show that policy-gradient algorithms are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning. Thereby, policy-gradient methods profit the most from Reactive Exploration and show good results in lifelong learning with continual domain shifts.

Cite this Paper


BibTeX
@InProceedings{pmlr-v199-steinparz22a, title = {Reactive Exploration to Cope With Non-Stationarity in Lifelong Reinforcement Learning}, author = {Steinparz, Christian Alexander and Schmied, Thomas and Paischer, Fabian and Dinu, Marius-constantin and Patil, Vihang Prakash and Bitto-nemling, Angela and Eghbal-zadeh, Hamid and Hochreiter, Sepp}, booktitle = {Proceedings of The 1st Conference on Lifelong Learning Agents}, pages = {441--469}, year = {2022}, editor = {Chandar, Sarath and Pascanu, Razvan and Precup, Doina}, volume = {199}, series = {Proceedings of Machine Learning Research}, month = {22--24 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v199/steinparz22a/steinparz22a.pdf}, url = {https://proceedings.mlr.press/v199/steinparz22a.html}, abstract = {In lifelong learning an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environmental dynamics. These non-stationarities, however, are difficult to detect and cope with due to their continuous nature. Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them. We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly. To this end, we conduct experiments in order to investigate different exploration strategies. We empirically show that policy-gradient algorithms are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning. Thereby, policy-gradient methods profit the most from Reactive Exploration and show good results in lifelong learning with continual domain shifts.} }
Endnote
%0 Conference Paper %T Reactive Exploration to Cope With Non-Stationarity in Lifelong Reinforcement Learning %A Christian Alexander Steinparz %A Thomas Schmied %A Fabian Paischer %A Marius-constantin Dinu %A Vihang Prakash Patil %A Angela Bitto-nemling %A Hamid Eghbal-zadeh %A Sepp Hochreiter %B Proceedings of The 1st Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2022 %E Sarath Chandar %E Razvan Pascanu %E Doina Precup %F pmlr-v199-steinparz22a %I PMLR %P 441--469 %U https://proceedings.mlr.press/v199/steinparz22a.html %V 199 %X In lifelong learning an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environmental dynamics. These non-stationarities, however, are difficult to detect and cope with due to their continuous nature. Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them. We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly. To this end, we conduct experiments in order to investigate different exploration strategies. We empirically show that policy-gradient algorithms are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning. Thereby, policy-gradient methods profit the most from Reactive Exploration and show good results in lifelong learning with continual domain shifts.
APA
Steinparz, C.A., Schmied, T., Paischer, F., Dinu, M., Patil, V.P., Bitto-nemling, A., Eghbal-zadeh, H. & Hochreiter, S.. (2022). Reactive Exploration to Cope With Non-Stationarity in Lifelong Reinforcement Learning. Proceedings of The 1st Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 199:441-469 Available from https://proceedings.mlr.press/v199/steinparz22a.html.

Related Material