Understanding Learning Invariance in Deep Linear Networks

Hao Duan, Guido Montúfar
Proceedings of the Geometry, Topology, and Machine Learning Workshop, PMLR 325:1-45, 2026.

Abstract

Equivariant and invariant machine learning models exploit symmetries and structural patterns in data to improve sample efficiency. While empirical studies suggest that data-driven methods such as regularization and data augmentation can perform comparably to explicitly invariant models, theoretical insights remain scarce. In this paper, we provide a theoretical comparison of three approaches for achieving invariance: data augmentation, regularization, and hard-wiring. We focus on mean squared error regression with deep linear networks, which parametrize rank-bounded linear maps and can be hard-wired to be invariant to specific group actions. We show that the critical points of the optimization problems for hard-wiring and data augmentation are identical, consisting solely of saddles and the global optimum. By contrast, regularization introduces additional critical points, though they remain saddles except for the global optimum. Moreover, we demonstrate that the regularization path is continuous and converges to the hard-wired solution.

Cite this Paper


BibTeX
@InProceedings{pmlr-v325-duan26a, title = {Understanding Learning Invariance in Deep Linear Networks}, author = {Duan, Hao and Mont\'{u}far, Guido}, booktitle = {Proceedings of the Geometry, Topology, and Machine Learning Workshop}, pages = {1--45}, year = {2026}, editor = {Bleher, Michael and Jensen, Freya and Maier, Levin and Taha, Diaaeldin and Wienhard, Anna}, volume = {325}, series = {Proceedings of Machine Learning Research}, month = {10--14 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v325/main/assets/duan26a/duan26a.pdf}, url = {https://proceedings.mlr.press/v325/duan26a.html}, abstract = {Equivariant and invariant machine learning models exploit symmetries and structural patterns in data to improve sample efficiency. While empirical studies suggest that data-driven methods such as regularization and data augmentation can perform comparably to explicitly invariant models, theoretical insights remain scarce. In this paper, we provide a theoretical comparison of three approaches for achieving invariance: data augmentation, regularization, and hard-wiring. We focus on mean squared error regression with deep linear networks, which parametrize rank-bounded linear maps and can be hard-wired to be invariant to specific group actions. We show that the critical points of the optimization problems for hard-wiring and data augmentation are identical, consisting solely of saddles and the global optimum. By contrast, regularization introduces additional critical points, though they remain saddles except for the global optimum. Moreover, we demonstrate that the regularization path is continuous and converges to the hard-wired solution.} }
Endnote
%0 Conference Paper %T Understanding Learning Invariance in Deep Linear Networks %A Hao Duan %A Guido Montúfar %B Proceedings of the Geometry, Topology, and Machine Learning Workshop %C Proceedings of Machine Learning Research %D 2026 %E Michael Bleher %E Freya Jensen %E Levin Maier %E Diaaeldin Taha %E Anna Wienhard %F pmlr-v325-duan26a %I PMLR %P 1--45 %U https://proceedings.mlr.press/v325/duan26a.html %V 325 %X Equivariant and invariant machine learning models exploit symmetries and structural patterns in data to improve sample efficiency. While empirical studies suggest that data-driven methods such as regularization and data augmentation can perform comparably to explicitly invariant models, theoretical insights remain scarce. In this paper, we provide a theoretical comparison of three approaches for achieving invariance: data augmentation, regularization, and hard-wiring. We focus on mean squared error regression with deep linear networks, which parametrize rank-bounded linear maps and can be hard-wired to be invariant to specific group actions. We show that the critical points of the optimization problems for hard-wiring and data augmentation are identical, consisting solely of saddles and the global optimum. By contrast, regularization introduces additional critical points, though they remain saddles except for the global optimum. Moreover, we demonstrate that the regularization path is continuous and converges to the hard-wired solution.
APA
Duan, H. & Montúfar, G.. (2026). Understanding Learning Invariance in Deep Linear Networks. Proceedings of the Geometry, Topology, and Machine Learning Workshop, in Proceedings of Machine Learning Research 325:1-45 Available from https://proceedings.mlr.press/v325/duan26a.html.

Related Material