Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials

Ilias Diakonikolas; Daniel M. Kane

Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials

Ilias Diakonikolas, Daniel M. Kane

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:1364-1378, 2024.

Abstract

We study the problem of PAC learning a linear combination of

$k$ ReLU activations under the standard Gaussian distribution on

$\mathbb{R}^d$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity

$(dk/\epsilon)^{O(k)}$ , where

$\epsilon>0$ is the target accuracy. Prior work had given an algorithm for this problem with complexity

$(dk/\epsilon)^{h(k)}$ , where the function

$h(k)$ scales super-polynomially in

$k$ . Interestingly, the complexity of our algorithm is near-optimal within the class of Correlational Statistical Query algorithms. At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the

$O(k)$ -order moments are small in the orthogonal directions. Its analysis makes essential use of the theory of Schur polynomials to show that the higher-moment error tensors are small given that the lower-order ones are.

Cite this Paper

BibTeX


@InProceedings{pmlr-v247-diakonikolas24c,
  title = 	 {Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials},
  author =       {Diakonikolas, Ilias and Kane, Daniel M.},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {1364--1378},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/diakonikolas24c/diakonikolas24c.pdf},
  url = 	 {https://proceedings.mlr.press/v247/diakonikolas24c.html},
  abstract = 	 {We study the problem of PAC learning a linear combination of $k$ ReLU activations  under the standard Gaussian distribution on $\mathbb{R}^d$ with respect to the square loss.  Our main result is an efficient algorithm for this learning task with sample and  computational complexity $(dk/\epsilon)^{O(k)}$, where $\epsilon>0$ is the target accuracy.  Prior work had given an algorithm for this problem with complexity $(dk/\epsilon)^{h(k)}$,  where the function $h(k)$ scales super-polynomially in $k$. Interestingly,  the complexity of our algorithm is near-optimal within the class of  Correlational Statistical Query algorithms.  At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the $O(k)$-order moments are small in the orthogonal directions.  Its analysis makes essential use of the theory of Schur polynomials  to show that the higher-moment error tensors are small given that the lower-order ones are.}
}

Endnote

%0 Conference Paper
%T Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials
%A Ilias Diakonikolas
%A Daniel M. Kane
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-diakonikolas24c
%I PMLR
%P 1364--1378
%U https://proceedings.mlr.press/v247/diakonikolas24c.html
%V 247
%X We study the problem of PAC learning a linear combination of $k$ ReLU activations  under the standard Gaussian distribution on $\mathbb{R}^d$ with respect to the square loss.  Our main result is an efficient algorithm for this learning task with sample and  computational complexity $(dk/\epsilon)^{O(k)}$, where $\epsilon>0$ is the target accuracy.  Prior work had given an algorithm for this problem with complexity $(dk/\epsilon)^{h(k)}$,  where the function $h(k)$ scales super-polynomially in $k$. Interestingly,  the complexity of our algorithm is near-optimal within the class of  Correlational Statistical Query algorithms.  At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the $O(k)$-order moments are small in the orthogonal directions.  Its analysis makes essential use of the theory of Schur polynomials  to show that the higher-moment error tensors are small given that the lower-order ones are.

APA


Diakonikolas, I. & Kane, D.M.. (2024). Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:1364-1378 Available from https://proceedings.mlr.press/v247/diakonikolas24c.html.

Related Material

Download PDF