A faster and simpler algorithm for learning shallow networks

Sitan Chen, Shyam Narayanan
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:981-994, 2024.

Abstract

We revisit the well-studied problem of learning a linear combination of $k$ ReLU activations given labeled examples drawn from the standard $d$-dimensional Gaussian measure. Chen et al. recently gave the first algorithm for this problem to run in $\mathrm{poly}(d,1/\epsilon)$ time when $k = O(1)$, where $\epsilon$ is the target error. More precisely, their algorithm runs in time $(d/\epsilon)^{\mathrm{quasipoly}(k)}$ and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only $(d k/\epsilon)^{O(k^2)}$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-chen24b, title = {A faster and simpler algorithm for learning shallow networks}, author = {Chen, Sitan and Narayanan, Shyam}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {981--994}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/chen24b/chen24b.pdf}, url = {https://proceedings.mlr.press/v247/chen24b.html}, abstract = {We revisit the well-studied problem of learning a linear combination of $k$ ReLU activations given labeled examples drawn from the standard $d$-dimensional Gaussian measure. Chen et al. recently gave the first algorithm for this problem to run in $\mathrm{poly}(d,1/\epsilon)$ time when $k = O(1)$, where $\epsilon$ is the target error. More precisely, their algorithm runs in time $(d/\epsilon)^{\mathrm{quasipoly}(k)}$ and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only $(d k/\epsilon)^{O(k^2)}$.} }
Endnote
%0 Conference Paper %T A faster and simpler algorithm for learning shallow networks %A Sitan Chen %A Shyam Narayanan %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-chen24b %I PMLR %P 981--994 %U https://proceedings.mlr.press/v247/chen24b.html %V 247 %X We revisit the well-studied problem of learning a linear combination of $k$ ReLU activations given labeled examples drawn from the standard $d$-dimensional Gaussian measure. Chen et al. recently gave the first algorithm for this problem to run in $\mathrm{poly}(d,1/\epsilon)$ time when $k = O(1)$, where $\epsilon$ is the target error. More precisely, their algorithm runs in time $(d/\epsilon)^{\mathrm{quasipoly}(k)}$ and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only $(d k/\epsilon)^{O(k^2)}$.
APA
Chen, S. & Narayanan, S.. (2024). A faster and simpler algorithm for learning shallow networks. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:981-994 Available from https://proceedings.mlr.press/v247/chen24b.html.

Related Material