Depth Separation in Norm-Bounded Infinite-Width Neural Networks

Suzanna Parkinson, Greg Ongie, Rebecca Willett, Ohad Shamir, Nathan Srebro
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:4082-4114, 2024.

Abstract

We study depth separation in infinite-width neural networks, where complexity is controlled by the overall squared $\ell_2$-norm of the weights (sum of squares of all weights in the network). Whereas previous depth separation results focused on separation in terms of width, such results do not give insight into whether depth determines if it is possible to learn a network that generalizes well even when the network width is unbounded. Here, we study separation in terms of the sample complexity required for learnability. Specifically, we show that there are functions that are learnable with sample complexity polynomial in the input dimension by norm-controlled depth-3 ReLU networks, yet are not learnable with sub-exponential sample complexity by norm-controlled depth-2 ReLU networks (with any value for the norm). We also show that a similar statement in the reverse direction is not possible: any function learnable with polynomial sample complexity by a norm-controlled depth-2 ReLU network with infinite width is also learnable with polynomial sample complexity by a norm-controlled depth-3 ReLU network.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-parkinson24a, title = {Depth Separation in Norm-Bounded Infinite-Width Neural Networks}, author = {Parkinson, Suzanna and Ongie, Greg and Willett, Rebecca and Shamir, Ohad and Srebro, Nathan}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {4082--4114}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/parkinson24a/parkinson24a.pdf}, url = {https://proceedings.mlr.press/v247/parkinson24a.html}, abstract = {We study depth separation in infinite-width neural networks, where complexity is controlled by the overall squared $\ell_2$-norm of the weights (sum of squares of all weights in the network). Whereas previous depth separation results focused on separation in terms of width, such results do not give insight into whether depth determines if it is possible to learn a network that generalizes well even when the network width is unbounded. Here, we study separation in terms of the sample complexity required for learnability. Specifically, we show that there are functions that are learnable with sample complexity polynomial in the input dimension by norm-controlled depth-3 ReLU networks, yet are not learnable with sub-exponential sample complexity by norm-controlled depth-2 ReLU networks (with any value for the norm). We also show that a similar statement in the reverse direction is not possible: any function learnable with polynomial sample complexity by a norm-controlled depth-2 ReLU network with infinite width is also learnable with polynomial sample complexity by a norm-controlled depth-3 ReLU network.} }
Endnote
%0 Conference Paper %T Depth Separation in Norm-Bounded Infinite-Width Neural Networks %A Suzanna Parkinson %A Greg Ongie %A Rebecca Willett %A Ohad Shamir %A Nathan Srebro %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-parkinson24a %I PMLR %P 4082--4114 %U https://proceedings.mlr.press/v247/parkinson24a.html %V 247 %X We study depth separation in infinite-width neural networks, where complexity is controlled by the overall squared $\ell_2$-norm of the weights (sum of squares of all weights in the network). Whereas previous depth separation results focused on separation in terms of width, such results do not give insight into whether depth determines if it is possible to learn a network that generalizes well even when the network width is unbounded. Here, we study separation in terms of the sample complexity required for learnability. Specifically, we show that there are functions that are learnable with sample complexity polynomial in the input dimension by norm-controlled depth-3 ReLU networks, yet are not learnable with sub-exponential sample complexity by norm-controlled depth-2 ReLU networks (with any value for the norm). We also show that a similar statement in the reverse direction is not possible: any function learnable with polynomial sample complexity by a norm-controlled depth-2 ReLU network with infinite width is also learnable with polynomial sample complexity by a norm-controlled depth-3 ReLU network.
APA
Parkinson, S., Ongie, G., Willett, R., Shamir, O. & Srebro, N.. (2024). Depth Separation in Norm-Bounded Infinite-Width Neural Networks. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:4082-4114 Available from https://proceedings.mlr.press/v247/parkinson24a.html.

Related Material