Depth Separations in Neural Networks: Separating the Dimension from the Accuracy

Itay Safran, Daniel Reichman, Paul Valiant
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:5108-5142, 2025.

Abstract

We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating a $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in the unit ball, under the mild assumption that the weights of the depth 2 network are exponentially bounded. This resolves an open problem posed in Safran et al. (2019), and proves that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network. Previously, lower bounds that were used to separate depth 2 from depth 3 networks required that at least one of the Lipschitz constant, target accuracy or (some measure of) the size of the domain of approximation scale \emph{polynomially} with the input dimension, whereas in our result these parameters are fixed to be \emph{constants} independent of the input dimension: our parameters are simultaneously optimal. Our lower bound holds for a wide variety of activation functions, and is based on a novel application of a worst- to average-case random self-reducibility argument, allowing us to leverage depth 2 threshold circuits lower bounds in a new domain.

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-safran25a, title = {Depth Separations in Neural Networks: Separating the Dimension from the Accuracy}, author = {Safran, Itay and Reichman, Daniel and Valiant, Paul}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {5108--5142}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/safran25a/safran25a.pdf}, url = {https://proceedings.mlr.press/v291/safran25a.html}, abstract = {We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating a $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in the unit ball, under the mild assumption that the weights of the depth 2 network are exponentially bounded. This resolves an open problem posed in Safran et al. (2019), and proves that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network. Previously, lower bounds that were used to separate depth 2 from depth 3 networks required that at least one of the Lipschitz constant, target accuracy or (some measure of) the size of the domain of approximation scale \emph{polynomially} with the input dimension, whereas in our result these parameters are fixed to be \emph{constants} independent of the input dimension: our parameters are simultaneously optimal. Our lower bound holds for a wide variety of activation functions, and is based on a novel application of a worst- to average-case random self-reducibility argument, allowing us to leverage depth 2 threshold circuits lower bounds in a new domain.} }
Endnote
%0 Conference Paper %T Depth Separations in Neural Networks: Separating the Dimension from the Accuracy %A Itay Safran %A Daniel Reichman %A Paul Valiant %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-safran25a %I PMLR %P 5108--5142 %U https://proceedings.mlr.press/v291/safran25a.html %V 291 %X We prove an exponential separation between depth 2 and depth 3 neural networks, when approximating a $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in the unit ball, under the mild assumption that the weights of the depth 2 network are exponentially bounded. This resolves an open problem posed in Safran et al. (2019), and proves that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network. Previously, lower bounds that were used to separate depth 2 from depth 3 networks required that at least one of the Lipschitz constant, target accuracy or (some measure of) the size of the domain of approximation scale \emph{polynomially} with the input dimension, whereas in our result these parameters are fixed to be \emph{constants} independent of the input dimension: our parameters are simultaneously optimal. Our lower bound holds for a wide variety of activation functions, and is based on a novel application of a worst- to average-case random self-reducibility argument, allowing us to leverage depth 2 threshold circuits lower bounds in a new domain.
APA
Safran, I., Reichman, D. & Valiant, P.. (2025). Depth Separations in Neural Networks: Separating the Dimension from the Accuracy. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:5108-5142 Available from https://proceedings.mlr.press/v291/safran25a.html.

Related Material