[edit]
On the Hardness of Learning One Hidden Layer Neural Networks
Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:700-701, 2025.
Abstract
In this work, we consider the problem of learning one hidden layer ReLU neural networks with inputs from Rd. It is well known due to (Klivans and Sherstov, 2009) that without further assumptions on the distribution D, e.g., when D can be supported over the Boolean hypercube, learning even one-hidden layer neural networks is impossible (or “hard”) for polynomial-time estimators under standard cryptographic assumptions. Given the success of neural networks in practice, a long line of recent work has attempted to study instead the canonical continuous input distribution case where D is the isotropic Gaussian, i.e., D=N(0,Id) which is also the setting that we follow in this work. Yet, despite a long line of research, it remains open whether there is a polynomial-time algorithm for learning one hidden layer neural networks when D=N(0,Id). It is known that a single neuron, i.e., zero hidden layer neural network, can be learned in polynomial time (Zarifis et al., 2024), while neural networks with more than two hidden layers are hard to learn (Chen et al., 2022). Nevertheless, the case of one hidden layer neural networks is not well understood. In this paper we close this gap in the literature by answering the question of efficient learnability of neural networks with one hidden layer. We establish that under the CLWE assumption from cryptography (Bruna et al., 2021), learning the class of one hidden layer neural network with polynomial size under standard Gaussian inputs and polynomially small Gaussian noise is indeed computationally hard. Importantly, solving CLWE in polynomial time implies a polynomial-time quantum algorithm that solves the worst-case gap shortest vector problem (GapSVP) within polynomial factors, a widely believed hard task in cryptography and algorithmic theory of lattices (Micciancio and Regev, 2009). En route, we prove the hardness of learning Lipschitz periodic functions under standard Gaussian inputs and polynomially small Gaussian noise. This improves the previous result from (Song et al., 2021), which proved the hardness for polynomially small adversarial noise. We also utilize the more general reductions between CLWE and classical LWE due to (Gupte et al., 2022). In particular, we show that if we assume the hardness of GapSVP with subexponential approximation factors 2O(dδ) for δ∈(0,1), we can show the hardness of learning one hidden layer neural networks with polynomial size under Gaussian noise with 2−dη variance, where η=δ1+δ∈(0,1/2). The current state-of-the-art algorithm for GapSVP is the celebrated Lenstra-Lenstra-Lovász (LLL) lattice basis reduction algorithm (Lenstra et al., 1982) which has approximation factor 2Θ(d). Hence, our results show that any polynomial time learning algorithm for one hidden layer neural networks for any variance of noise σ2≥2−o(√d) would imply a major algorithmic breakthrough in the theory of lattices.