Learning One-hidden-layer ReLU Networks via Gradient Descent

Xiao Zhang; Yaodong Yu; Lingxiao Wang; Quanquan Gu

Learning One-hidden-layer ReLU Networks via Gradient Descent

Xiao Zhang, Yaodong Yu, Lingxiao Wang, Quanquan Gu

Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1524-1534, 2019.

Abstract

We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. We analyze the performance of gradient descent for training such kind of neural networks based on empirical risk minimization, and provide algorithm-dependent guarantees. In particular, we prove that tensor initialization followed by gradient descent can converge to the ground-truth parameters at a linear rate up to some statistical error. To the best of our knowledge, this is the first work characterizing the recovery guarantee for practical learning of one-hidden-layer ReLU networks with multiple neurons. Numerical experiments verify our theoretical findings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v89-zhang19g,
  title = 	 {Learning One-hidden-layer ReLU Networks via Gradient Descent},
  author =       {Zhang, Xiao and Yu, Yaodong and Wang, Lingxiao and Gu, Quanquan},
  booktitle = 	 {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1524--1534},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Sugiyama, Masashi},
  volume = 	 {89},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {16--18 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v89/zhang19g/zhang19g.pdf},
  url = 	 {https://proceedings.mlr.press/v89/zhang19g.html},
  abstract = 	 {We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. We analyze the performance of gradient descent for training such kind of neural networks based on empirical risk minimization, and provide algorithm-dependent guarantees. In particular, we prove that tensor initialization followed by gradient descent can converge to the ground-truth parameters at a linear rate up to some statistical error. To the best of our knowledge, this is the first work characterizing the recovery guarantee for practical learning of one-hidden-layer ReLU networks with multiple neurons. Numerical experiments verify our theoretical findings.}
}

Endnote

%0 Conference Paper
%T Learning One-hidden-layer ReLU Networks via Gradient Descent
%A Xiao Zhang
%A Yaodong Yu
%A Lingxiao Wang
%A Quanquan Gu
%B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Masashi Sugiyama	
%F pmlr-v89-zhang19g
%I PMLR
%P 1524--1534
%U https://proceedings.mlr.press/v89/zhang19g.html
%V 89
%X We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. We analyze the performance of gradient descent for training such kind of neural networks based on empirical risk minimization, and provide algorithm-dependent guarantees. In particular, we prove that tensor initialization followed by gradient descent can converge to the ground-truth parameters at a linear rate up to some statistical error. To the best of our knowledge, this is the first work characterizing the recovery guarantee for practical learning of one-hidden-layer ReLU networks with multiple neurons. Numerical experiments verify our theoretical findings.

APA


Zhang, X., Yu, Y., Wang, L. & Gu, Q.. (2019). Learning One-hidden-layer ReLU Networks via Gradient Descent. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1524-1534 Available from https://proceedings.mlr.press/v89/zhang19g.html.

Related Material

Download PDF