Fixed Design Analysis of Regularization-Based Continual Learning

Haoran Li, Jingfeng Wu, Vladimir Braverman
Proceedings of The 2nd Conference on Lifelong Learning Agents, PMLR 232:513-533, 2023.

Abstract

We consider a continual learning (CL) problem with two linear regression tasks in the fixed design setting, where the feature vectors are assumed fixed and the labels are assumed to be random variables. We consider an $\ell_2$-regularized CL algorithm, which computes an Ordinary Least Squares parameter to fit the first dataset, then computes another parameter that fits the second dataset under an $\ell_2$-regularization penalizing its deviation from the first parameter, and outputs the second parameter. For this algorithm, we provide tight upper and lower bounds on the average risk over the two tasks. Our risk bounds reveal a provable trade-off between forgetting and intransigence of the $\ell_2$-regularized CL algorithm: with a large regularization parameter, the algorithm output forgets less information about the first task but is intransigent to extract new information from the second task; and vice versa. Our results suggest that catastrophic forgetting could happen for CL with dissimilar tasks (under a precise similarity measurement), and that a well-tuned $\ell_2$-regularization can partially mitigate this issue by introducing intransigence.

Cite this Paper


BibTeX
@InProceedings{pmlr-v232-li23b, title = {Fixed Design Analysis of Regularization-Based Continual Learning}, author = {Li, Haoran and Wu, Jingfeng and Braverman, Vladimir}, booktitle = {Proceedings of The 2nd Conference on Lifelong Learning Agents}, pages = {513--533}, year = {2023}, editor = {Chandar, Sarath and Pascanu, Razvan and Sedghi, Hanie and Precup, Doina}, volume = {232}, series = {Proceedings of Machine Learning Research}, month = {22--25 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v232/li23b/li23b.pdf}, url = {https://proceedings.mlr.press/v232/li23b.html}, abstract = {We consider a continual learning (CL) problem with two linear regression tasks in the fixed design setting, where the feature vectors are assumed fixed and the labels are assumed to be random variables. We consider an $\ell_2$-regularized CL algorithm, which computes an Ordinary Least Squares parameter to fit the first dataset, then computes another parameter that fits the second dataset under an $\ell_2$-regularization penalizing its deviation from the first parameter, and outputs the second parameter. For this algorithm, we provide tight upper and lower bounds on the average risk over the two tasks. Our risk bounds reveal a provable trade-off between forgetting and intransigence of the $\ell_2$-regularized CL algorithm: with a large regularization parameter, the algorithm output forgets less information about the first task but is intransigent to extract new information from the second task; and vice versa. Our results suggest that catastrophic forgetting could happen for CL with dissimilar tasks (under a precise similarity measurement), and that a well-tuned $\ell_2$-regularization can partially mitigate this issue by introducing intransigence.} }
Endnote
%0 Conference Paper %T Fixed Design Analysis of Regularization-Based Continual Learning %A Haoran Li %A Jingfeng Wu %A Vladimir Braverman %B Proceedings of The 2nd Conference on Lifelong Learning Agents %C Proceedings of Machine Learning Research %D 2023 %E Sarath Chandar %E Razvan Pascanu %E Hanie Sedghi %E Doina Precup %F pmlr-v232-li23b %I PMLR %P 513--533 %U https://proceedings.mlr.press/v232/li23b.html %V 232 %X We consider a continual learning (CL) problem with two linear regression tasks in the fixed design setting, where the feature vectors are assumed fixed and the labels are assumed to be random variables. We consider an $\ell_2$-regularized CL algorithm, which computes an Ordinary Least Squares parameter to fit the first dataset, then computes another parameter that fits the second dataset under an $\ell_2$-regularization penalizing its deviation from the first parameter, and outputs the second parameter. For this algorithm, we provide tight upper and lower bounds on the average risk over the two tasks. Our risk bounds reveal a provable trade-off between forgetting and intransigence of the $\ell_2$-regularized CL algorithm: with a large regularization parameter, the algorithm output forgets less information about the first task but is intransigent to extract new information from the second task; and vice versa. Our results suggest that catastrophic forgetting could happen for CL with dissimilar tasks (under a precise similarity measurement), and that a well-tuned $\ell_2$-regularization can partially mitigate this issue by introducing intransigence.
APA
Li, H., Wu, J. & Braverman, V.. (2023). Fixed Design Analysis of Regularization-Based Continual Learning. Proceedings of The 2nd Conference on Lifelong Learning Agents, in Proceedings of Machine Learning Research 232:513-533 Available from https://proceedings.mlr.press/v232/li23b.html.

Related Material