Loss of Plasticity in Continual Deep Reinforcement Learning

Zaheer Abbas, Rosie Zhao, Joseph Modayil, Adam White, Marlos C. Machado
Proceedings of The 2nd Conference on Lifelong Learning Agents, PMLR 232:620-636, 2023.

Abstract

In this paper, we characterize the behavior of canonical value-based deep reinforcement learning (RL) approaches under varying degrees of non-stationarity. In particular, we demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games. This phenomenon is alluded to in prior work under various guises—e.g., loss of plasticity, implicit under-parameterization, primacy bias, and capacity loss. We investigate this phenomenon closely at scale and analyze how the weights, gradients, and activations change over time in several experiments with varying experimental conditions (e.g., similarity between games, number of games, number of frames per game). Our analysis shows that the activation footprint of the network becomes sparser, contributing to the diminishing gradients. We investigate a remarkably simple mitigation strategy—Concatenated ReLUs (CReLUs) activation function—and demonstrate its effectiveness in facilitating continual learning in a changing environment.

Cite this Paper


BibTeX
@InProceedings{pmlr-v232-abbas23a, title = {Loss of Plasticity in Continual Deep Reinforcement Learning}, author = {Abbas, Zaheer and Zhao, Rosie and Modayil, Joseph and White, Adam and Machado, Marlos C.}, booktitle = {Proceedings of The 2nd Conference on Lifelong Learning Agents}, pages = {620--636}, year = {2023}, editor = {Chandar, Sarath and Pascanu, Razvan and Sedghi, Hanie and Precup, Doina}, volume = {232}, series = {Proceedings of Machine Learning Research}, month = {22--25 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v232/abbas23a/abbas23a.pdf}, url = {https://proceedings.mlr.press/v232/abbas23a.html}, abstract = {In this paper, we characterize the behavior of canonical value-based deep reinforcement learning (RL) approaches under varying degrees of non-stationarity. In particular, we demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games. This phenomenon is alluded to in prior work under various guises—e.g., loss of plasticity, implicit under-parameterization, primacy bias, and capacity loss. We investigate this phenomenon closely at scale and analyze how the weights, gradients, and activations change over time in several experiments with varying experimental conditions (e.g., similarity between games, number of games, number of frames per game). Our analysis shows that the activation footprint of the network becomes sparser, contributing to the diminishing gradients. We investigate a remarkably simple mitigation strategy—Concatenated ReLUs (CReLUs) activation function—and demonstrate its effectiveness in facilitating continual learning in a changing environment.} }
Endnote
%0 Conference Paper %T Loss of Plasticity in Continual Deep Reinforcement Learning %A Zaheer Abbas %A Rosie Zhao %A Joseph Modayil %A Adam White %A Marlos C. Machado %B Proceedings of The 2nd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2023 %E Sarath Chandar %E Razvan Pascanu %E Hanie Sedghi %E Doina Precup %F pmlr-v232-abbas23a %I PMLR %P 620--636 %U https://proceedings.mlr.press/v232/abbas23a.html %V 232 %X In this paper, we characterize the behavior of canonical value-based deep reinforcement learning (RL) approaches under varying degrees of non-stationarity. In particular, we demonstrate that deep RL agents lose their ability to learn good policies when they cycle through a sequence of Atari 2600 games. This phenomenon is alluded to in prior work under various guises—e.g., loss of plasticity, implicit under-parameterization, primacy bias, and capacity loss. We investigate this phenomenon closely at scale and analyze how the weights, gradients, and activations change over time in several experiments with varying experimental conditions (e.g., similarity between games, number of games, number of frames per game). Our analysis shows that the activation footprint of the network becomes sparser, contributing to the diminishing gradients. We investigate a remarkably simple mitigation strategy—Concatenated ReLUs (CReLUs) activation function—and demonstrate its effectiveness in facilitating continual learning in a changing environment.
APA
Abbas, Z., Zhao, R., Modayil, J., White, A. & Machado, M.C.. (2023). Loss of Plasticity in Continual Deep Reinforcement Learning. Proceedings of The 2nd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 232:620-636 Available from https://proceedings.mlr.press/v232/abbas23a.html.

Related Material