Exponential Convergence Time of Gradient Descent for    One-Dimensional Deep Linear Neural Networks

Ohad Shamir

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

Ohad Shamir

Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:2691-2713, 2019.

Abstract

We study the dynamics of gradient descent on objective functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar parameters $w_1,\ldots,w_k$), which arise in the context of training depth-$k$ linear neural networks. We prove that for standard random initializations, and under mild assumptions on $f$, the number of iterations required for convergence scales exponentially with the depth $k$. We also show empirically that this phenomenon can occur in higher dimensions, where each $w_i$ is a matrix. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where $k$ is large.

Cite this Paper

BibTeX


@InProceedings{pmlr-v99-shamir19a,
  title = 	 {Exponential Convergence Time of Gradient Descent for    One-Dimensional Deep Linear Neural Networks},
  author =       {Shamir, Ohad},
  booktitle = 	 {Proceedings of the Thirty-Second Conference on Learning Theory},
  pages = 	 {2691--2713},
  year = 	 {2019},
  editor = 	 {Beygelzimer, Alina and Hsu, Daniel},
  volume = 	 {99},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--28 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v99/shamir19a/shamir19a.pdf},
  url = 	 {https://proceedings.mlr.press/v99/shamir19a.html},
  abstract = 	 {We study the dynamics of gradient descent on objective  functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar  parameters $w_1,\ldots,w_k$), which arise in the context of  training depth-$k$ linear neural networks. We prove that for standard  random initializations, and under mild assumptions on $f$, the number of  iterations required for convergence scales exponentially with the depth  $k$. We also show empirically that this phenomenon can occur in higher  dimensions, where each $w_i$ is a matrix. This highlights a potential  obstacle in understanding the convergence of gradient-based methods for  deep linear neural networks, where $k$ is large.}
}

Endnote

%0 Conference Paper
%T Exponential Convergence Time of Gradient Descent for    One-Dimensional Deep Linear Neural Networks
%A Ohad Shamir
%B Proceedings of the Thirty-Second Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2019
%E Alina Beygelzimer
%E Daniel Hsu	
%F pmlr-v99-shamir19a
%I PMLR
%P 2691--2713
%U https://proceedings.mlr.press/v99/shamir19a.html
%V 99
%X We study the dynamics of gradient descent on objective  functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar  parameters $w_1,\ldots,w_k$), which arise in the context of  training depth-$k$ linear neural networks. We prove that for standard  random initializations, and under mild assumptions on $f$, the number of  iterations required for convergence scales exponentially with the depth  $k$. We also show empirically that this phenomenon can occur in higher  dimensions, where each $w_i$ is a matrix. This highlights a potential  obstacle in understanding the convergence of gradient-based methods for  deep linear neural networks, where $k$ is large.

APA


Shamir, O.. (2019). Exponential Convergence Time of Gradient Descent for    One-Dimensional Deep Linear Neural Networks. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:2691-2713 Available from https://proceedings.mlr.press/v99/shamir19a.html.

Related Material

Download PDF