Depth Separations in Neural Networks: Separating the Dimension from the Accuracy

Itay Safran; Daniel Reichman; Paul Valiant

Depth Separations in Neural Networks: Separating the Dimension from the Accuracy

Itay Safran, Daniel Reichman, Paul Valiant

Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:5108-5142, 2025.

Abstract

We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating a $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in the unit ball, under the mild assumption that the weights of the depth 2 network are exponentially bounded. This resolves an open problem posed in Safran et al. (2019), and proves that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network. Previously, lower bounds that were used to separate depth 2 from depth 3 networks required that at least one of the Lipschitz constant, target accuracy or (some measure of) the size of the domain of approximation scale \emph{polynomially} with the input dimension, whereas in our result these parameters are fixed to be \emph{constants} independent of the input dimension: our parameters are simultaneously optimal. Our lower bound holds for a wide variety of activation functions, and is based on a novel application of a worst- to average-case random self-reducibility argument, allowing us to leverage depth 2 threshold circuits lower bounds in a new domain.

Cite this Paper

BibTeX

@InProceedings{pmlr-v291-safran25a,
  title = 	 {Depth Separations in Neural Networks: Separating the Dimension from the Accuracy},
  author =       {Safran, Itay and Reichman, Daniel and Valiant, Paul},
  booktitle = 	 {Proceedings of Thirty Eighth Conference on Learning Theory},
  pages = 	 {5108--5142},
  year = 	 {2025},
  editor = 	 {Haghtalab, Nika and Moitra, Ankur},
  volume = 	 {291},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {30 Jun--04 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v291/main/assets/safran25a/safran25a.pdf},
  url = 	 {https://proceedings.mlr.press/v291/safran25a.html},
  abstract = 	 {We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating a $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in the unit ball, under the mild assumption that the weights of the depth 2 network are exponentially bounded. This resolves an open problem posed in Safran et al. (2019), and proves that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network. Previously, lower bounds that were used to separate depth 2 from depth 3 networks required that at least one of the Lipschitz constant, target accuracy or (some measure of) the size of the domain of approximation scale \emph{polynomially} with the input dimension, whereas in our result these parameters are fixed to be \emph{constants} independent of the input dimension: our parameters are simultaneously optimal. Our lower bound holds for a wide variety of activation functions, and is based on a novel application of a worst- to average-case random self-reducibility argument, allowing us to leverage depth 2 threshold circuits lower bounds in a new domain.}
}

Endnote

%0 Conference Paper
%T Depth Separations in Neural Networks: Separating the Dimension from the Accuracy
%A Itay Safran
%A Daniel Reichman
%A Paul Valiant
%B Proceedings of Thirty Eighth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2025
%E Nika Haghtalab
%E Ankur Moitra	
%F pmlr-v291-safran25a
%I PMLR
%P 5108--5142
%U https://proceedings.mlr.press/v291/safran25a.html
%V 291
%X We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating a $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in the unit ball, under the mild assumption that the weights of the depth 2 network are exponentially bounded. This resolves an open problem posed in Safran et al. (2019), and proves that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network. Previously, lower bounds that were used to separate depth 2 from depth 3 networks required that at least one of the Lipschitz constant, target accuracy or (some measure of) the size of the domain of approximation scale \emph{polynomially} with the input dimension, whereas in our result these parameters are fixed to be \emph{constants} independent of the input dimension: our parameters are simultaneously optimal. Our lower bound holds for a wide variety of activation functions, and is based on a novel application of a worst- to average-case random self-reducibility argument, allowing us to leverage depth 2 threshold circuits lower bounds in a new domain.

APA

Safran, I., Reichman, D. & Valiant, P.. (2025). Depth Separations in Neural Networks: Separating the Dimension from the Accuracy. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:5108-5142 Available from https://proceedings.mlr.press/v291/safran25a.html.

Related Material

Download PDF