Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials

Ilias Diakonikolas, Daniel M. Kane
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:1364-1378, 2024.

Abstract

We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $\mathbb{R}^d$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/\epsilon)^{O(k)}$, where $\epsilon>0$ is the target accuracy. Prior work had given an algorithm for this problem with complexity $(dk/\epsilon)^{h(k)}$, where the function $h(k)$ scales super-polynomially in $k$. Interestingly, the complexity of our algorithm is near-optimal within the class of Correlational Statistical Query algorithms. At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the $O(k)$-order moments are small in the orthogonal directions. Its analysis makes essential use of the theory of Schur polynomials to show that the higher-moment error tensors are small given that the lower-order ones are.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-diakonikolas24c, title = {Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials}, author = {Diakonikolas, Ilias and Kane, Daniel M.}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {1364--1378}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/diakonikolas24c/diakonikolas24c.pdf}, url = {https://proceedings.mlr.press/v247/diakonikolas24c.html}, abstract = {We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $\mathbb{R}^d$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/\epsilon)^{O(k)}$, where $\epsilon>0$ is the target accuracy. Prior work had given an algorithm for this problem with complexity $(dk/\epsilon)^{h(k)}$, where the function $h(k)$ scales super-polynomially in $k$. Interestingly, the complexity of our algorithm is near-optimal within the class of Correlational Statistical Query algorithms. At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the $O(k)$-order moments are small in the orthogonal directions. Its analysis makes essential use of the theory of Schur polynomials to show that the higher-moment error tensors are small given that the lower-order ones are.} }
Endnote
%0 Conference Paper %T Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials %A Ilias Diakonikolas %A Daniel M. Kane %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-diakonikolas24c %I PMLR %P 1364--1378 %U https://proceedings.mlr.press/v247/diakonikolas24c.html %V 247 %X We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $\mathbb{R}^d$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/\epsilon)^{O(k)}$, where $\epsilon>0$ is the target accuracy. Prior work had given an algorithm for this problem with complexity $(dk/\epsilon)^{h(k)}$, where the function $h(k)$ scales super-polynomially in $k$. Interestingly, the complexity of our algorithm is near-optimal within the class of Correlational Statistical Query algorithms. At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the $O(k)$-order moments are small in the orthogonal directions. Its analysis makes essential use of the theory of Schur polynomials to show that the higher-moment error tensors are small given that the lower-order ones are.
APA
Diakonikolas, I. & Kane, D.M.. (2024). Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:1364-1378 Available from https://proceedings.mlr.press/v247/diakonikolas24c.html.

Related Material