Implicit Bias of the Step Size in Linear Diagonal Neural Networks

Mor Shpigel Nacson, Kavya Ravichandran, Nathan Srebro, Daniel Soudry
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:16270-16295, 2022.

Abstract

Focusing on diagonal linear networks as a model for understanding the implicit bias in underdetermined models, we show how the gradient descent step size can have a large qualitative effect on the implicit bias, and thus on generalization ability. In particular, we show how using large step size for non-centered data can change the implicit bias from a "kernel" type behavior to a "rich" (sparsity-inducing) regime — even when gradient flow, studied in previous works, would not escape the "kernel" regime. We do so by using dynamic stability, proving that convergence to dynamically stable global minima entails a bound on some weighted $\ell_1$-norm of the linear predictor, i.e. a "rich" regime. We prove this leads to good generalization in a sparse regression setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-nacson22a, title = {Implicit Bias of the Step Size in Linear Diagonal Neural Networks}, author = {Nacson, Mor Shpigel and Ravichandran, Kavya and Srebro, Nathan and Soudry, Daniel}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {16270--16295}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/nacson22a/nacson22a.pdf}, url = {https://proceedings.mlr.press/v162/nacson22a.html}, abstract = {Focusing on diagonal linear networks as a model for understanding the implicit bias in underdetermined models, we show how the gradient descent step size can have a large qualitative effect on the implicit bias, and thus on generalization ability. In particular, we show how using large step size for non-centered data can change the implicit bias from a "kernel" type behavior to a "rich" (sparsity-inducing) regime — even when gradient flow, studied in previous works, would not escape the "kernel" regime. We do so by using dynamic stability, proving that convergence to dynamically stable global minima entails a bound on some weighted $\ell_1$-norm of the linear predictor, i.e. a "rich" regime. We prove this leads to good generalization in a sparse regression setting.} }
Endnote
%0 Conference Paper %T Implicit Bias of the Step Size in Linear Diagonal Neural Networks %A Mor Shpigel Nacson %A Kavya Ravichandran %A Nathan Srebro %A Daniel Soudry %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-nacson22a %I PMLR %P 16270--16295 %U https://proceedings.mlr.press/v162/nacson22a.html %V 162 %X Focusing on diagonal linear networks as a model for understanding the implicit bias in underdetermined models, we show how the gradient descent step size can have a large qualitative effect on the implicit bias, and thus on generalization ability. In particular, we show how using large step size for non-centered data can change the implicit bias from a "kernel" type behavior to a "rich" (sparsity-inducing) regime — even when gradient flow, studied in previous works, would not escape the "kernel" regime. We do so by using dynamic stability, proving that convergence to dynamically stable global minima entails a bound on some weighted $\ell_1$-norm of the linear predictor, i.e. a "rich" regime. We prove this leads to good generalization in a sparse regression setting.
APA
Nacson, M.S., Ravichandran, K., Srebro, N. & Soudry, D.. (2022). Implicit Bias of the Step Size in Linear Diagonal Neural Networks. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:16270-16295 Available from https://proceedings.mlr.press/v162/nacson22a.html.

Related Material