A faster and simpler algorithm for learning shallow networks

Sitan Chen; Shyam Narayanan

A faster and simpler algorithm for learning shallow networks

Sitan Chen, Shyam Narayanan

Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:981-994, 2024.

Abstract

We revisit the well-studied problem of learning a linear combination of

$k$ ReLU activations given labeled examples drawn from the standard

$d$ -dimensional Gaussian measure. Chen et al. recently gave the first algorithm for this problem to run in

$\mathrm{poly}(d,1/\epsilon)$ time when

$k = O(1)$ , where

$\epsilon$ is the target error. More precisely, their algorithm runs in time

$(d/\epsilon)^{\mathrm{quasipoly}(k)}$ and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only

$(d k/\epsilon)^{O(k^2)}$ .

Cite this Paper

BibTeX


@InProceedings{pmlr-v247-chen24b,
  title = 	 {A faster and simpler algorithm for learning shallow networks},
  author =       {Chen, Sitan and Narayanan, Shyam},
  booktitle = 	 {Proceedings of Thirty Seventh Conference on Learning Theory},
  pages = 	 {981--994},
  year = 	 {2024},
  editor = 	 {Agrawal, Shipra and Roth, Aaron},
  volume = 	 {247},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--03 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v247/chen24b/chen24b.pdf},
  url = 	 {https://proceedings.mlr.press/v247/chen24b.html},
  abstract = 	 {We revisit the well-studied problem of learning a linear combination of $k$ ReLU activations given labeled examples drawn from the standard $d$-dimensional Gaussian measure. Chen et al. recently gave the first algorithm for this problem to run in $\mathrm{poly}(d,1/\epsilon)$ time when $k = O(1)$, where $\epsilon$ is the target error. More precisely, their algorithm runs in time $(d/\epsilon)^{\mathrm{quasipoly}(k)}$ and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only $(d k/\epsilon)^{O(k^2)}$.}
}

Endnote

%0 Conference Paper
%T A faster and simpler algorithm for learning shallow networks
%A Sitan Chen
%A Shyam Narayanan
%B Proceedings of Thirty Seventh Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2024
%E Shipra Agrawal
%E Aaron Roth	
%F pmlr-v247-chen24b
%I PMLR
%P 981--994
%U https://proceedings.mlr.press/v247/chen24b.html
%V 247
%X We revisit the well-studied problem of learning a linear combination of $k$ ReLU activations given labeled examples drawn from the standard $d$-dimensional Gaussian measure. Chen et al. recently gave the first algorithm for this problem to run in $\mathrm{poly}(d,1/\epsilon)$ time when $k = O(1)$, where $\epsilon$ is the target error. More precisely, their algorithm runs in time $(d/\epsilon)^{\mathrm{quasipoly}(k)}$ and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only $(d k/\epsilon)^{O(k^2)}$.

APA


Chen, S. & Narayanan, S.. (2024). A faster and simpler algorithm for learning shallow networks. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:981-994 Available from https://proceedings.mlr.press/v247/chen24b.html.

Related Material

Download PDF