Learning One-hidden-layer ReLU Networks via Gradient Descent

Xiao Zhang, Yaodong Yu, Lingxiao Wang, Quanquan Gu
Proceedings of Machine Learning Research, PMLR 89:1524-1534, 2019.

Abstract

We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. We analyze the performance of gradient descent for training such kind of neural networks based on empirical risk minimization, and provide algorithm-dependent guarantees. In particular, we prove that tensor initialization followed by gradient descent can converge to the ground-truth parameters at a linear rate up to some statistical error. To the best of our knowledge, this is the first work characterizing the recovery guarantee for practical learning of one-hidden-layer ReLU networks with multiple neurons. Numerical experiments verify our theoretical findings.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-zhang19g, title = {Learning One-hidden-layer ReLU Networks via Gradient Descent}, author = {Zhang, Xiao and Yu, Yaodong and Wang, Lingxiao and Gu, Quanquan}, booktitle = {Proceedings of Machine Learning Research}, pages = {1524--1534}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/zhang19g/zhang19g.pdf}, url = { http://proceedings.mlr.press/v89/zhang19g.html }, abstract = {We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. We analyze the performance of gradient descent for training such kind of neural networks based on empirical risk minimization, and provide algorithm-dependent guarantees. In particular, we prove that tensor initialization followed by gradient descent can converge to the ground-truth parameters at a linear rate up to some statistical error. To the best of our knowledge, this is the first work characterizing the recovery guarantee for practical learning of one-hidden-layer ReLU networks with multiple neurons. Numerical experiments verify our theoretical findings.} }
Endnote
%0 Conference Paper %T Learning One-hidden-layer ReLU Networks via Gradient Descent %A Xiao Zhang %A Yaodong Yu %A Lingxiao Wang %A Quanquan Gu %B Proceedings of Machine Learning Research %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-zhang19g %I PMLR %P 1524--1534 %U http://proceedings.mlr.press/v89/zhang19g.html %V 89 %X We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. We analyze the performance of gradient descent for training such kind of neural networks based on empirical risk minimization, and provide algorithm-dependent guarantees. In particular, we prove that tensor initialization followed by gradient descent can converge to the ground-truth parameters at a linear rate up to some statistical error. To the best of our knowledge, this is the first work characterizing the recovery guarantee for practical learning of one-hidden-layer ReLU networks with multiple neurons. Numerical experiments verify our theoretical findings.
APA
Zhang, X., Yu, Y., Wang, L. & Gu, Q.. (2019). Learning One-hidden-layer ReLU Networks via Gradient Descent. Proceedings of Machine Learning Research, in Proceedings of Machine Learning Research 89:1524-1534 Available from http://proceedings.mlr.press/v89/zhang19g.html .

Related Material