[edit]
A faster and simpler algorithm for learning shallow networks
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:981-994, 2024.
Abstract
We revisit the well-studied problem of learning a linear combination of k ReLU activations given labeled examples drawn from the standard d-dimensional Gaussian measure. Chen et al. recently gave the first algorithm for this problem to run in poly(d,1/ϵ) time when k=O(1), where ϵ is the target error. More precisely, their algorithm runs in time (d/ϵ)quasipoly(k) and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only (dk/ϵ)O(k2).