Understanding Forgetting in Continual Learning with Linear Regression

Meng Ding; Kaiyi Ji; Di Wang; Jinhui Xu

Understanding Forgetting in Continual Learning with Linear Regression

Meng Ding, Kaiyi Ji, Di Wang, Jinhui Xu

Proceedings of the 41st International Conference on Machine Learning, PMLR 235:10978-11001, 2024.

Abstract

Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to

$\textit{catastrophic forgetting}$ , remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic Gradient Descent (SGD) applicable to both under-parameterized and overparameterized regimes. Our theoretical framework reveals some interesting insights into the intricate relationship between task sequence and algorithmic parameters, an aspect not fully captured in previous studies due to their restrictive assumptions. Specifically, we demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence—where tasks with larger eigenvalues in their population data covariance matrices are trained later—tends to result in increased forgetting. Additionally, our findings highlight that an appropriate choice of step size will help mitigate forgetting in both under-parameterized and overparameterized settings. To validate our theoretical analysis, we conducted simulation experiments on both linear regression models and Deep Neural Networks (DNNs). Results from these simulations substantiate our theoretical findings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v235-ding24c,
  title = 	 {Understanding Forgetting in Continual Learning with Linear Regression},
  author =       {Ding, Meng and Ji, Kaiyi and Wang, Di and Xu, Jinhui},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {10978--11001},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ding24c/ding24c.pdf},
  url = 	 {https://proceedings.mlr.press/v235/ding24c.html},
  abstract = 	 {Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to $\textit{catastrophic forgetting}$, remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic Gradient Descent (SGD) applicable to both under-parameterized and overparameterized regimes. Our theoretical framework reveals some interesting insights into the intricate relationship between task sequence and algorithmic parameters, an aspect not fully captured in previous studies due to their restrictive assumptions. Specifically, we demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence—where tasks with larger eigenvalues in their population data covariance matrices are trained later—tends to result in increased forgetting. Additionally, our findings highlight that an appropriate choice of step size will help mitigate forgetting in both under-parameterized and overparameterized settings. To validate our theoretical analysis, we conducted simulation experiments on both linear regression models and Deep Neural Networks (DNNs). Results from these simulations substantiate our theoretical findings.}
}

Endnote

%0 Conference Paper
%T Understanding Forgetting in Continual Learning with Linear Regression
%A Meng Ding
%A Kaiyi Ji
%A Di Wang
%A Jinhui Xu
%B Proceedings of the 41st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2024
%E Ruslan Salakhutdinov
%E Zico Kolter
%E Katherine Heller
%E Adrian Weller
%E Nuria Oliver
%E Jonathan Scarlett
%E Felix Berkenkamp	
%F pmlr-v235-ding24c
%I PMLR
%P 10978--11001
%U https://proceedings.mlr.press/v235/ding24c.html
%V 235
%X Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to $\textit{catastrophic forgetting}$, remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic Gradient Descent (SGD) applicable to both under-parameterized and overparameterized regimes. Our theoretical framework reveals some interesting insights into the intricate relationship between task sequence and algorithmic parameters, an aspect not fully captured in previous studies due to their restrictive assumptions. Specifically, we demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence—where tasks with larger eigenvalues in their population data covariance matrices are trained later—tends to result in increased forgetting. Additionally, our findings highlight that an appropriate choice of step size will help mitigate forgetting in both under-parameterized and overparameterized settings. To validate our theoretical analysis, we conducted simulation experiments on both linear regression models and Deep Neural Networks (DNNs). Results from these simulations substantiate our theoretical findings.

APA


Ding, M., Ji, K., Wang, D. & Xu, J.. (2024). Understanding Forgetting in Continual Learning with Linear Regression. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:10978-11001 Available from https://proceedings.mlr.press/v235/ding24c.html.

Understanding Forgetting in Continual Learning with Linear Regression

Abstract

Cite this Paper

Related Material