Implicit Regularization Towards Rank Minimization in ReLU Networks

Nadav Timor, Gal Vardi, Ohad Shamir
Proceedings of The 34th International Conference on Algorithmic Learning Theory, PMLR 201:1429-1459, 2023.

Abstract

We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices. Previously, it was proved that for linear networks (of depth $2$ and vector-valued outputs), gradient flow (GF) w.r.t. the square loss acts as a rank minimization heuristic. However, understanding to what extent this generalizes to nonlinear networks is an open problem. In this paper, we focus on nonlinear ReLU networks, providing several new positive and negative results. On the negative side, we prove (and demonstrate empirically) that, unlike the linear case, GF on ReLU networks may no longer tend to minimize ranks, in a rather strong sense (even approximately, for “most” datasets of size $2$). On the positive side, we reveal that ReLU networks of sufficient depth are provably biased towards low-rank solutions in several reasonable settings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v201-timor23a, title = {Implicit Regularization Towards Rank Minimization in ReLU Networks}, author = {Timor, Nadav and Vardi, Gal and Shamir, Ohad}, booktitle = {Proceedings of The 34th International Conference on Algorithmic Learning Theory}, pages = {1429--1459}, year = {2023}, editor = {Agrawal, Shipra and Orabona, Francesco}, volume = {201}, series = {Proceedings of Machine Learning Research}, month = {20 Feb--23 Feb}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v201/timor23a/timor23a.pdf}, url = {https://proceedings.mlr.press/v201/timor23a.html}, abstract = {We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices. Previously, it was proved that for linear networks (of depth $2$ and vector-valued outputs), gradient flow (GF) w.r.t. the square loss acts as a rank minimization heuristic. However, understanding to what extent this generalizes to nonlinear networks is an open problem. In this paper, we focus on nonlinear ReLU networks, providing several new positive and negative results. On the negative side, we prove (and demonstrate empirically) that, unlike the linear case, GF on ReLU networks may no longer tend to minimize ranks, in a rather strong sense (even approximately, for “most” datasets of size $2$). On the positive side, we reveal that ReLU networks of sufficient depth are provably biased towards low-rank solutions in several reasonable settings.} }
Endnote
%0 Conference Paper %T Implicit Regularization Towards Rank Minimization in ReLU Networks %A Nadav Timor %A Gal Vardi %A Ohad Shamir %B Proceedings of The 34th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Shipra Agrawal %E Francesco Orabona %F pmlr-v201-timor23a %I PMLR %P 1429--1459 %U https://proceedings.mlr.press/v201/timor23a.html %V 201 %X We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices. Previously, it was proved that for linear networks (of depth $2$ and vector-valued outputs), gradient flow (GF) w.r.t. the square loss acts as a rank minimization heuristic. However, understanding to what extent this generalizes to nonlinear networks is an open problem. In this paper, we focus on nonlinear ReLU networks, providing several new positive and negative results. On the negative side, we prove (and demonstrate empirically) that, unlike the linear case, GF on ReLU networks may no longer tend to minimize ranks, in a rather strong sense (even approximately, for “most” datasets of size $2$). On the positive side, we reveal that ReLU networks of sufficient depth are provably biased towards low-rank solutions in several reasonable settings.
APA
Timor, N., Vardi, G. & Shamir, O.. (2023). Implicit Regularization Towards Rank Minimization in ReLU Networks. Proceedings of The 34th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 201:1429-1459 Available from https://proceedings.mlr.press/v201/timor23a.html.

Related Material