Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent

Surbhi Goel, Aravind Gollakota, Zhihan Jin, Sushrut Karmalkar, Adam Klivans
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:3587-3596, 2020.

Abstract

We give the first superpolynomial lower bounds for learning one-layer neural networks with respect to the Gaussian distribution for a broad class of algorithms. In the regression setting, we prove that gradient descent run on any classifier with respect to square loss will fail to achieve small test error in polynomial time. Prior work held only for gradient descent run with small batch sizes and sufficiently smooth classifiers. For classification, we give a stronger result, namely that any statistical query (SQ) algorithm will fail to achieve small test error in polynomial time. Our lower bounds hold for commonly used activations such as ReLU and sigmoid. The core of our result relies on a novel construction of a simple family of neural networks that are exactly orthogonal with respect to all spherically symmetric distributions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-goel20a, title = {Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent}, author = {Goel, Surbhi and Gollakota, Aravind and Jin, Zhihan and Karmalkar, Sushrut and Klivans, Adam}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {3587--3596}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/goel20a/goel20a.pdf}, url = {http://proceedings.mlr.press/v119/goel20a.html}, abstract = {We give the first superpolynomial lower bounds for learning one-layer neural networks with respect to the Gaussian distribution for a broad class of algorithms. In the regression setting, we prove that gradient descent run on any classifier with respect to square loss will fail to achieve small test error in polynomial time. Prior work held only for gradient descent run with small batch sizes and sufficiently smooth classifiers. For classification, we give a stronger result, namely that any statistical query (SQ) algorithm will fail to achieve small test error in polynomial time. Our lower bounds hold for commonly used activations such as ReLU and sigmoid. The core of our result relies on a novel construction of a simple family of neural networks that are exactly orthogonal with respect to all spherically symmetric distributions.} }
Endnote
%0 Conference Paper %T Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent %A Surbhi Goel %A Aravind Gollakota %A Zhihan Jin %A Sushrut Karmalkar %A Adam Klivans %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-goel20a %I PMLR %P 3587--3596 %U http://proceedings.mlr.press/v119/goel20a.html %V 119 %X We give the first superpolynomial lower bounds for learning one-layer neural networks with respect to the Gaussian distribution for a broad class of algorithms. In the regression setting, we prove that gradient descent run on any classifier with respect to square loss will fail to achieve small test error in polynomial time. Prior work held only for gradient descent run with small batch sizes and sufficiently smooth classifiers. For classification, we give a stronger result, namely that any statistical query (SQ) algorithm will fail to achieve small test error in polynomial time. Our lower bounds hold for commonly used activations such as ReLU and sigmoid. The core of our result relies on a novel construction of a simple family of neural networks that are exactly orthogonal with respect to all spherically symmetric distributions.
APA
Goel, S., Gollakota, A., Jin, Z., Karmalkar, S. & Klivans, A.. (2020). Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:3587-3596 Available from http://proceedings.mlr.press/v119/goel20a.html.

Related Material