On the Hardness of Learning One Hidden Layer Neural Networks

Shuchen Li; Ilias Zadik; Manolis Zampetakis

On the Hardness of Learning One Hidden Layer Neural Networks

Shuchen Li, Ilias Zadik, Manolis Zampetakis

Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:700-701, 2025.

Abstract

In this work, we consider the problem of learning one hidden layer ReLU neural networks with inputs from

$\mathbb{R}^d$ . It is well known due to (Klivans and Sherstov, 2009) that without further assumptions on the distribution

$\mathcal{D}$ , e.g., when

$\mathcal{D}$ can be supported over the Boolean hypercube, learning even one-hidden layer neural networks is impossible (or “hard”) for polynomial-time estimators under standard cryptographic assumptions. Given the success of neural networks in practice, a long line of recent work has attempted to study instead the canonical continuous input distribution case where

$\mathcal{D}$ is the isotropic Gaussian, i.e.,

$\mathcal{D}=\mathcal{N}\left(0,I_d\right)$ which is also the setting that we follow in this work. Yet, despite a long line of research, it remains open whether there is a polynomial-time algorithm for learning one hidden layer neural networks when

$\mathcal{D} = \mathcal{N}\left(0,I_d\right)$ . It is known that a single neuron, i.e., zero hidden layer neural network, can be learned in polynomial time (Zarifis et al., 2024), while neural networks with more than two hidden layers are hard to learn (Chen et al., 2022). Nevertheless, the case of one hidden layer neural networks is not well understood. In this paper we close this gap in the literature by answering the question of efficient learnability of neural networks with one hidden layer. We establish that under the CLWE assumption from cryptography (Bruna et al., 2021), learning the class of one hidden layer neural network with polynomial size under standard Gaussian inputs and polynomially small Gaussian noise is indeed computationally hard. Importantly, solving CLWE in polynomial time implies a polynomial-time quantum algorithm that solves the worst-case gap shortest vector problem (GapSVP) within polynomial factors, a widely believed hard task in cryptography and algorithmic theory of lattices (Micciancio and Regev, 2009). En route, we prove the hardness of learning Lipschitz periodic functions under standard Gaussian inputs and polynomially small Gaussian noise. This improves the previous result from (Song et al., 2021), which proved the hardness for polynomially small adversarial noise. We also utilize the more general reductions between CLWE and classical LWE due to (Gupte et al., 2022). In particular, we show that if we assume the hardness of GapSVP with subexponential approximation factors

$2^{O(d^{\delta })}$ for

$\delta \in (0, 1)$ , we can show the hardness of learning one hidden layer neural networks with polynomial size under Gaussian noise with

$2^{-d^{\eta}}$ variance, where

$\eta = \frac{\delta}{1+\delta}\in (0, 1/2)$ . The current state-of-the-art algorithm for GapSVP is the celebrated Lenstra-Lenstra-Lovász (LLL) lattice basis reduction algorithm (Lenstra et al., 1982) which has approximation factor

$2^{\Theta(d)}$ . Hence, our results show that any polynomial time learning algorithm for one hidden layer neural networks for any variance of noise

$\sigma^2 \ge 2^{-o(\sqrt{d})}$ would imply a major algorithmic breakthrough in the theory of lattices.

Cite this Paper

BibTeX

@InProceedings{pmlr-v272-li25a,
  title = 	 {On the Hardness of Learning One Hidden Layer Neural Networks},
  author =       {Li, Shuchen and Zadik, Ilias and Zampetakis, Manolis},
  booktitle = 	 {Proceedings of The 36th International Conference on Algorithmic Learning Theory},
  pages = 	 {700--701},
  year = 	 {2025},
  editor = 	 {Kamath, Gautam and Loh, Po-Ling},
  volume = 	 {272},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {24--27 Feb},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v272/main/assets/li25a/li25a.pdf},
  url = 	 {https://proceedings.mlr.press/v272/li25a.html},
  abstract = 	 {In this work, we consider the problem of learning one hidden layer ReLU neural networks with inputs from $\mathbb{R}^d$. It is well known due to (Klivans and Sherstov, 2009) that without further assumptions on the distribution $\mathcal{D}$, e.g., when $\mathcal{D}$ can be supported over the Boolean hypercube, learning even one-hidden layer neural networks is impossible (or “hard”) for polynomial-time estimators under standard cryptographic assumptions. Given the success of neural networks in practice, a long line of recent work has attempted to study instead the canonical continuous input distribution case where $\mathcal{D}$ is the isotropic Gaussian, i.e., $\mathcal{D}=\mathcal{N}\left(0,I_d\right)$ which is also the setting that we follow in this work. Yet, despite a long line of research, it remains open whether there is a polynomial-time algorithm for learning one hidden layer neural networks when $\mathcal{D} = \mathcal{N}\left(0,I_d\right)$. It is known that a single neuron, i.e., zero hidden layer neural network, can be learned in polynomial time (Zarifis et al., 2024), while neural networks with more than two hidden layers are hard to learn (Chen et al., 2022). Nevertheless, the case of one hidden layer neural networks is not well understood.  In this paper we close this gap in the literature by answering the question of efficient learnability of neural networks with one hidden layer. We establish that under the CLWE assumption from cryptography (Bruna et al., 2021), learning the class of one hidden layer neural network with polynomial size under standard Gaussian inputs and polynomially small Gaussian noise is indeed computationally hard. Importantly, solving CLWE in polynomial time implies a polynomial-time quantum algorithm that solves the worst-case gap shortest vector problem (GapSVP) within polynomial factors, a widely believed hard task in cryptography and algorithmic theory of lattices (Micciancio and Regev, 2009). En route, we prove the hardness of learning Lipschitz periodic functions under standard Gaussian inputs and polynomially small Gaussian noise. This improves the previous result from (Song et al., 2021), which proved the hardness for polynomially small adversarial noise.  We also utilize the more general reductions between CLWE and classical LWE due to (Gupte et al., 2022). In particular, we show that if we assume the hardness of GapSVP with subexponential approximation factors $2^{O(d^{\delta })}$ for $\delta \in (0, 1)$, we can show the hardness of learning one hidden layer neural networks with polynomial size under Gaussian noise with $2^{-d^{\eta}}$ variance, where $\eta = \frac{\delta}{1+\delta}\in (0, 1/2)$. The current state-of-the-art algorithm for GapSVP is the celebrated Lenstra-Lenstra-Lovász (LLL) lattice basis reduction algorithm (Lenstra et al., 1982) which has approximation factor $2^{\Theta(d)}$. Hence, our results show  that any polynomial time learning algorithm for one hidden layer neural networks for any variance of noise $\sigma^2 \ge 2^{-o(\sqrt{d})}$ would imply a major algorithmic breakthrough in the theory of lattices. }
}

Endnote

%0 Conference Paper
%T On the Hardness of Learning One Hidden Layer Neural Networks
%A Shuchen Li
%A Ilias Zadik
%A Manolis Zampetakis
%B Proceedings of The 36th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2025
%E Gautam Kamath
%E Po-Ling Loh	
%F pmlr-v272-li25a
%I PMLR
%P 700--701
%U https://proceedings.mlr.press/v272/li25a.html
%V 272
%X In this work, we consider the problem of learning one hidden layer ReLU neural networks with inputs from $\mathbb{R}^d$. It is well known due to (Klivans and Sherstov, 2009) that without further assumptions on the distribution $\mathcal{D}$, e.g., when $\mathcal{D}$ can be supported over the Boolean hypercube, learning even one-hidden layer neural networks is impossible (or “hard”) for polynomial-time estimators under standard cryptographic assumptions. Given the success of neural networks in practice, a long line of recent work has attempted to study instead the canonical continuous input distribution case where $\mathcal{D}$ is the isotropic Gaussian, i.e., $\mathcal{D}=\mathcal{N}\left(0,I_d\right)$ which is also the setting that we follow in this work. Yet, despite a long line of research, it remains open whether there is a polynomial-time algorithm for learning one hidden layer neural networks when $\mathcal{D} = \mathcal{N}\left(0,I_d\right)$. It is known that a single neuron, i.e., zero hidden layer neural network, can be learned in polynomial time (Zarifis et al., 2024), while neural networks with more than two hidden layers are hard to learn (Chen et al., 2022). Nevertheless, the case of one hidden layer neural networks is not well understood.  In this paper we close this gap in the literature by answering the question of efficient learnability of neural networks with one hidden layer. We establish that under the CLWE assumption from cryptography (Bruna et al., 2021), learning the class of one hidden layer neural network with polynomial size under standard Gaussian inputs and polynomially small Gaussian noise is indeed computationally hard. Importantly, solving CLWE in polynomial time implies a polynomial-time quantum algorithm that solves the worst-case gap shortest vector problem (GapSVP) within polynomial factors, a widely believed hard task in cryptography and algorithmic theory of lattices (Micciancio and Regev, 2009). En route, we prove the hardness of learning Lipschitz periodic functions under standard Gaussian inputs and polynomially small Gaussian noise. This improves the previous result from (Song et al., 2021), which proved the hardness for polynomially small adversarial noise.  We also utilize the more general reductions between CLWE and classical LWE due to (Gupte et al., 2022). In particular, we show that if we assume the hardness of GapSVP with subexponential approximation factors $2^{O(d^{\delta })}$ for $\delta \in (0, 1)$, we can show the hardness of learning one hidden layer neural networks with polynomial size under Gaussian noise with $2^{-d^{\eta}}$ variance, where $\eta = \frac{\delta}{1+\delta}\in (0, 1/2)$. The current state-of-the-art algorithm for GapSVP is the celebrated Lenstra-Lenstra-Lovász (LLL) lattice basis reduction algorithm (Lenstra et al., 1982) which has approximation factor $2^{\Theta(d)}$. Hence, our results show  that any polynomial time learning algorithm for one hidden layer neural networks for any variance of noise $\sigma^2 \ge 2^{-o(\sqrt{d})}$ would imply a major algorithmic breakthrough in the theory of lattices.

APA

Li, S., Zadik, I. & Zampetakis, M.. (2025). On the Hardness of Learning One Hidden Layer Neural Networks. Proceedings of The 36th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 272:700-701 Available from https://proceedings.mlr.press/v272/li25a.html.

On the Hardness of Learning One Hidden Layer Neural Networks

Abstract

Cite this Paper

Related Material