Self-Composing Policies for Scalable Continual Reinforcement Learning

Mikel Malagon, Josu Ceberio, Jose A. Lozano
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:34432-34460, 2024.

Abstract

This work introduces a growable and modular neural network architecture that naturally avoids catastrophic forgetting and interference in continual reinforcement learning. The structure of each module allows the selective combination of previous policies along with its internal policy accelerating the learning process on the current task. Unlike previous growing neural network approaches, we show that the number of parameters of the proposed approach grows linearly with respect to the number of tasks, and does not sacrifice plasticity to scale. Experiments conducted in benchmark continuous control and visual problems reveal that the proposed approach achieves greater knowledge transfer and performance than alternative methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-malagon24a, title = {Self-Composing Policies for Scalable Continual Reinforcement Learning}, author = {Malagon, Mikel and Ceberio, Josu and Lozano, Jose A.}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {34432--34460}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/malagon24a/malagon24a.pdf}, url = {https://proceedings.mlr.press/v235/malagon24a.html}, abstract = {This work introduces a growable and modular neural network architecture that naturally avoids catastrophic forgetting and interference in continual reinforcement learning. The structure of each module allows the selective combination of previous policies along with its internal policy accelerating the learning process on the current task. Unlike previous growing neural network approaches, we show that the number of parameters of the proposed approach grows linearly with respect to the number of tasks, and does not sacrifice plasticity to scale. Experiments conducted in benchmark continuous control and visual problems reveal that the proposed approach achieves greater knowledge transfer and performance than alternative methods.} }
Endnote
%0 Conference Paper %T Self-Composing Policies for Scalable Continual Reinforcement Learning %A Mikel Malagon %A Josu Ceberio %A Jose A. Lozano %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-malagon24a %I PMLR %P 34432--34460 %U https://proceedings.mlr.press/v235/malagon24a.html %V 235 %X This work introduces a growable and modular neural network architecture that naturally avoids catastrophic forgetting and interference in continual reinforcement learning. The structure of each module allows the selective combination of previous policies along with its internal policy accelerating the learning process on the current task. Unlike previous growing neural network approaches, we show that the number of parameters of the proposed approach grows linearly with respect to the number of tasks, and does not sacrifice plasticity to scale. Experiments conducted in benchmark continuous control and visual problems reveal that the proposed approach achieves greater knowledge transfer and performance than alternative methods.
APA
Malagon, M., Ceberio, J. & Lozano, J.A.. (2024). Self-Composing Policies for Scalable Continual Reinforcement Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:34432-34460 Available from https://proceedings.mlr.press/v235/malagon24a.html.

Related Material