How do Quadratic Regularizers Prevent Catastrophic Forgetting: The Role of Interpolation

Ekdeep Singh Lubana, Puja Trivedi, Danai Koutra, Robert Dick
Proceedings of The 1st Conference on Lifelong Learning Agents, PMLR 199:819-837, 2022.

Abstract

Catastrophic forgetting undermines the effectiveness of deep neural networks (DNNs) in scenarios such as continual learning and lifelong learning. While several methods have been proposed to tackle this problem, there is limited work explaining why these methods work well. This paper has the goal of better explaining a popularly used technique for avoiding catastrophic forgetting: quadratic regularization. We show that quadratic regularizers prevent forgetting of past tasks by interpolating current and previous values of model parameters at every training iteration. Over multiple training iterations, this interpolation operation reduces the learning rates of more important model parameters, thereby minimizing their movement. Our analysis also reveals two drawbacks of quadratic regularization: (a) dependence of parameter interpolation on training hyperparameters, which often leads to training instability and (b) assignment of lower importance to deeper layers, which are generally the place forgetting occurs in DNNs. Via a simple modification to the order of operations, we show these drawbacks can be easily avoided, resulting in 6.2% higher average accuracy at 4.5% lower average forgetting. We confirm the robustness of our results by training over 2000 models in different settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v199-lubana22a, title = {How do Quadratic Regularizers Prevent Catastrophic Forgetting: The Role of Interpolation}, author = {Lubana, Ekdeep Singh and Trivedi, Puja and Koutra, Danai and Dick, Robert}, booktitle = {Proceedings of The 1st Conference on Lifelong Learning Agents}, pages = {819--837}, year = {2022}, editor = {Chandar, Sarath and Pascanu, Razvan and Precup, Doina}, volume = {199}, series = {Proceedings of Machine Learning Research}, month = {22--24 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v199/lubana22a/lubana22a.pdf}, url = {https://proceedings.mlr.press/v199/lubana22a.html}, abstract = {Catastrophic forgetting undermines the effectiveness of deep neural networks (DNNs) in scenarios such as continual learning and lifelong learning. While several methods have been proposed to tackle this problem, there is limited work explaining why these methods work well. This paper has the goal of better explaining a popularly used technique for avoiding catastrophic forgetting: quadratic regularization. We show that quadratic regularizers prevent forgetting of past tasks by interpolating current and previous values of model parameters at every training iteration. Over multiple training iterations, this interpolation operation reduces the learning rates of more important model parameters, thereby minimizing their movement. Our analysis also reveals two drawbacks of quadratic regularization: (a) dependence of parameter interpolation on training hyperparameters, which often leads to training instability and (b) assignment of lower importance to deeper layers, which are generally the place forgetting occurs in DNNs. Via a simple modification to the order of operations, we show these drawbacks can be easily avoided, resulting in 6.2% higher average accuracy at 4.5% lower average forgetting. We confirm the robustness of our results by training over 2000 models in different settings.} }
Endnote
%0 Conference Paper %T How do Quadratic Regularizers Prevent Catastrophic Forgetting: The Role of Interpolation %A Ekdeep Singh Lubana %A Puja Trivedi %A Danai Koutra %A Robert Dick %B Proceedings of The 1st Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2022 %E Sarath Chandar %E Razvan Pascanu %E Doina Precup %F pmlr-v199-lubana22a %I PMLR %P 819--837 %U https://proceedings.mlr.press/v199/lubana22a.html %V 199 %X Catastrophic forgetting undermines the effectiveness of deep neural networks (DNNs) in scenarios such as continual learning and lifelong learning. While several methods have been proposed to tackle this problem, there is limited work explaining why these methods work well. This paper has the goal of better explaining a popularly used technique for avoiding catastrophic forgetting: quadratic regularization. We show that quadratic regularizers prevent forgetting of past tasks by interpolating current and previous values of model parameters at every training iteration. Over multiple training iterations, this interpolation operation reduces the learning rates of more important model parameters, thereby minimizing their movement. Our analysis also reveals two drawbacks of quadratic regularization: (a) dependence of parameter interpolation on training hyperparameters, which often leads to training instability and (b) assignment of lower importance to deeper layers, which are generally the place forgetting occurs in DNNs. Via a simple modification to the order of operations, we show these drawbacks can be easily avoided, resulting in 6.2% higher average accuracy at 4.5% lower average forgetting. We confirm the robustness of our results by training over 2000 models in different settings.
APA
Lubana, E.S., Trivedi, P., Koutra, D. & Dick, R.. (2022). How do Quadratic Regularizers Prevent Catastrophic Forgetting: The Role of Interpolation. Proceedings of The 1st Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 199:819-837 Available from https://proceedings.mlr.press/v199/lubana22a.html.

Related Material