Stay Hungry, Keep Learning: Sustainable Plasticity for Deep Reinforcement Learning

Huaicheng Zhou, Zifeng Zhuang, Donglin Wang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:79644-79672, 2025.

Abstract

The integration of Deep Neural Networks in Reinforcement Learning (RL) systems has led to remarkable progress in solving complex tasks but also introduced challenges like primacy bias and dead neurons. Primacy bias skews learning towards early experiences, while dead neurons diminish the network’s capacity to acquire new knowledge. Traditional reset mechanisms aimed at addressing these issues often involve maintaining large replay buffers to train new networks or selectively resetting subsets of neurons. However, these approaches either incur prohibitive computational costs or reset network parameters without ensuring stability through recovery mechanisms, ultimately impairing learning efficiency. In this work, we introduce the novel concept of neuron regeneration, which combines reset mechanisms with knowledge recovery techniques. We also propose a new framework called Sustainable Backup Propagation(SBP) that effectively maintains plasticity in neural networks through this neuron regeneration process. The SBP framework achieves whole network neuron regeneration through two key procedures: cycle reset and inner distillation. Cycle reset involves a scheduled renewal of neurons, while inner distillation functions as a knowledge recovery mechanism at the neuron level. To validate our framework, we integrate SBP with Proximal Policy Optimization (PPO) and propose a novel distillation function for inner distillation. This integration results in Plastic PPO (P3O), a new algorithm that enables efficient cyclic regeneration of all neurons in the actor network. Extensive experiments demonstrate the approach effectively maintains policy plasticity and improves sample efficiency in reinforcement learning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-zhou25am, title = {Stay Hungry, Keep Learning: Sustainable Plasticity for Deep Reinforcement Learning}, author = {Zhou, Huaicheng and Zhuang, Zifeng and Wang, Donglin}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {79644--79672}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/zhou25am/zhou25am.pdf}, url = {https://proceedings.mlr.press/v267/zhou25am.html}, abstract = {The integration of Deep Neural Networks in Reinforcement Learning (RL) systems has led to remarkable progress in solving complex tasks but also introduced challenges like primacy bias and dead neurons. Primacy bias skews learning towards early experiences, while dead neurons diminish the network’s capacity to acquire new knowledge. Traditional reset mechanisms aimed at addressing these issues often involve maintaining large replay buffers to train new networks or selectively resetting subsets of neurons. However, these approaches either incur prohibitive computational costs or reset network parameters without ensuring stability through recovery mechanisms, ultimately impairing learning efficiency. In this work, we introduce the novel concept of neuron regeneration, which combines reset mechanisms with knowledge recovery techniques. We also propose a new framework called Sustainable Backup Propagation(SBP) that effectively maintains plasticity in neural networks through this neuron regeneration process. The SBP framework achieves whole network neuron regeneration through two key procedures: cycle reset and inner distillation. Cycle reset involves a scheduled renewal of neurons, while inner distillation functions as a knowledge recovery mechanism at the neuron level. To validate our framework, we integrate SBP with Proximal Policy Optimization (PPO) and propose a novel distillation function for inner distillation. This integration results in Plastic PPO (P3O), a new algorithm that enables efficient cyclic regeneration of all neurons in the actor network. Extensive experiments demonstrate the approach effectively maintains policy plasticity and improves sample efficiency in reinforcement learning.} }
Endnote
%0 Conference Paper %T Stay Hungry, Keep Learning: Sustainable Plasticity for Deep Reinforcement Learning %A Huaicheng Zhou %A Zifeng Zhuang %A Donglin Wang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-zhou25am %I PMLR %P 79644--79672 %U https://proceedings.mlr.press/v267/zhou25am.html %V 267 %X The integration of Deep Neural Networks in Reinforcement Learning (RL) systems has led to remarkable progress in solving complex tasks but also introduced challenges like primacy bias and dead neurons. Primacy bias skews learning towards early experiences, while dead neurons diminish the network’s capacity to acquire new knowledge. Traditional reset mechanisms aimed at addressing these issues often involve maintaining large replay buffers to train new networks or selectively resetting subsets of neurons. However, these approaches either incur prohibitive computational costs or reset network parameters without ensuring stability through recovery mechanisms, ultimately impairing learning efficiency. In this work, we introduce the novel concept of neuron regeneration, which combines reset mechanisms with knowledge recovery techniques. We also propose a new framework called Sustainable Backup Propagation(SBP) that effectively maintains plasticity in neural networks through this neuron regeneration process. The SBP framework achieves whole network neuron regeneration through two key procedures: cycle reset and inner distillation. Cycle reset involves a scheduled renewal of neurons, while inner distillation functions as a knowledge recovery mechanism at the neuron level. To validate our framework, we integrate SBP with Proximal Policy Optimization (PPO) and propose a novel distillation function for inner distillation. This integration results in Plastic PPO (P3O), a new algorithm that enables efficient cyclic regeneration of all neurons in the actor network. Extensive experiments demonstrate the approach effectively maintains policy plasticity and improves sample efficiency in reinforcement learning.
APA
Zhou, H., Zhuang, Z. & Wang, D.. (2025). Stay Hungry, Keep Learning: Sustainable Plasticity for Deep Reinforcement Learning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:79644-79672 Available from https://proceedings.mlr.press/v267/zhou25am.html.

Related Material