On Non-local Convergence Analysis of Deep Linear Networks

Kun Chen, Dachao Lin, Zhihua Zhang
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:3417-3443, 2022.

Abstract

In this paper, we study the non-local convergence properties of deep linear networks. Specifically, under the quadratic loss, we consider optimizing deep linear networks in which there is at least a layer with only one neuron. We describe the convergent point of trajectories with an arbitrary balanced starting point under gradient flow, including the paths which converge to one of the saddle points. We also show specific convergence rates of trajectories that converge to the global minimizers by stages. We conclude that the rates vary from polynomial to linear. As far as we know, our results are the first to give a non-local analysis of deep linear neural networks with arbitrary balanced initialization, rather than the lazy training regime which has dominated the literature on neural networks or the restricted benign initialization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-chen22p, title = {On Non-local Convergence Analysis of Deep Linear Networks}, author = {Chen, Kun and Lin, Dachao and Zhang, Zhihua}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {3417--3443}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/chen22p/chen22p.pdf}, url = {https://proceedings.mlr.press/v162/chen22p.html}, abstract = {In this paper, we study the non-local convergence properties of deep linear networks. Specifically, under the quadratic loss, we consider optimizing deep linear networks in which there is at least a layer with only one neuron. We describe the convergent point of trajectories with an arbitrary balanced starting point under gradient flow, including the paths which converge to one of the saddle points. We also show specific convergence rates of trajectories that converge to the global minimizers by stages. We conclude that the rates vary from polynomial to linear. As far as we know, our results are the first to give a non-local analysis of deep linear neural networks with arbitrary balanced initialization, rather than the lazy training regime which has dominated the literature on neural networks or the restricted benign initialization.} }
Endnote
%0 Conference Paper %T On Non-local Convergence Analysis of Deep Linear Networks %A Kun Chen %A Dachao Lin %A Zhihua Zhang %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-chen22p %I PMLR %P 3417--3443 %U https://proceedings.mlr.press/v162/chen22p.html %V 162 %X In this paper, we study the non-local convergence properties of deep linear networks. Specifically, under the quadratic loss, we consider optimizing deep linear networks in which there is at least a layer with only one neuron. We describe the convergent point of trajectories with an arbitrary balanced starting point under gradient flow, including the paths which converge to one of the saddle points. We also show specific convergence rates of trajectories that converge to the global minimizers by stages. We conclude that the rates vary from polynomial to linear. As far as we know, our results are the first to give a non-local analysis of deep linear neural networks with arbitrary balanced initialization, rather than the lazy training regime which has dominated the literature on neural networks or the restricted benign initialization.
APA
Chen, K., Lin, D. & Zhang, Z.. (2022). On Non-local Convergence Analysis of Deep Linear Networks. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:3417-3443 Available from https://proceedings.mlr.press/v162/chen22p.html.

Related Material