Optimal L2 Regularization in High-dimensional Continual Linear Regression

Gilad Karpel, Edward Moroshko, Ran Levinstein, Ron Meir, Daniel Soudry, Itay Evron
Proceedings of The 37th International Conference on Algorithmic Learning Theory, PMLR 313:1-62, 2026.

Abstract

We study generalization in an overparameterized continual linear regression setting, where a model is trained with L2 (isotropic) regularization across a sequence of tasks. We derive a closed-form expression for the expected generalization loss in the high-dimensional regime that holds for arbitrary linear teachers. We demonstrate that isotropic regularization mitigates label noise under both single-teacher and multiple i.i.d. teacher settings, whereas prior work accommodating multiple teachers either did not employ regularization or used memory-demanding methods. Furthermore, we prove that the optimal fixed regularization strength scales nearly linearly with the number of tasks $T$, specifically as $T/\ln T$. To our knowledge, this is the first such result in theoretical continual learning. Finally, we validate our theoretical findings through experiments on linear regression and neural networks, illustrating how this scaling law affects generalization and offering a practical recipe for the design of continual learning systems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v313-karpel26a, title = {Optimal L2 Regularization in High-dimensional Continual Linear Regression}, author = {Karpel, Gilad and Moroshko, Edward and Levinstein, Ran and Meir, Ron and Soudry, Daniel and Evron, Itay}, booktitle = {Proceedings of The 37th International Conference on Algorithmic Learning Theory}, pages = {1--62}, year = {2026}, editor = {Telgarsky, Matus and Ullman, Jonathan}, volume = {313}, series = {Proceedings of Machine Learning Research}, month = {23--26 Feb}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v313/main/assets/karpel26a/karpel26a.pdf}, url = {https://proceedings.mlr.press/v313/karpel26a.html}, abstract = {We study generalization in an overparameterized continual linear regression setting, where a model is trained with L2 (isotropic) regularization across a sequence of tasks. We derive a closed-form expression for the expected generalization loss in the high-dimensional regime that holds for arbitrary linear teachers. We demonstrate that isotropic regularization mitigates label noise under both single-teacher and multiple i.i.d. teacher settings, whereas prior work accommodating multiple teachers either did not employ regularization or used memory-demanding methods. Furthermore, we prove that the optimal fixed regularization strength scales nearly linearly with the number of tasks $T$, specifically as $T/\ln T$. To our knowledge, this is the first such result in theoretical continual learning. Finally, we validate our theoretical findings through experiments on linear regression and neural networks, illustrating how this scaling law affects generalization and offering a practical recipe for the design of continual learning systems.} }
Endnote
%0 Conference Paper %T Optimal L2 Regularization in High-dimensional Continual Linear Regression %A Gilad Karpel %A Edward Moroshko %A Ran Levinstein %A Ron Meir %A Daniel Soudry %A Itay Evron %B Proceedings of The 37th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2026 %E Matus Telgarsky %E Jonathan Ullman %F pmlr-v313-karpel26a %I PMLR %P 1--62 %U https://proceedings.mlr.press/v313/karpel26a.html %V 313 %X We study generalization in an overparameterized continual linear regression setting, where a model is trained with L2 (isotropic) regularization across a sequence of tasks. We derive a closed-form expression for the expected generalization loss in the high-dimensional regime that holds for arbitrary linear teachers. We demonstrate that isotropic regularization mitigates label noise under both single-teacher and multiple i.i.d. teacher settings, whereas prior work accommodating multiple teachers either did not employ regularization or used memory-demanding methods. Furthermore, we prove that the optimal fixed regularization strength scales nearly linearly with the number of tasks $T$, specifically as $T/\ln T$. To our knowledge, this is the first such result in theoretical continual learning. Finally, we validate our theoretical findings through experiments on linear regression and neural networks, illustrating how this scaling law affects generalization and offering a practical recipe for the design of continual learning systems.
APA
Karpel, G., Moroshko, E., Levinstein, R., Meir, R., Soudry, D. & Evron, I.. (2026). Optimal L2 Regularization in High-dimensional Continual Linear Regression. Proceedings of The 37th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 313:1-62 Available from https://proceedings.mlr.press/v313/karpel26a.html.

Related Material