Width is Less Important than Depth in ReLU Neural Networks

Gal Vardi, Gilad Yehudai, Ohad Shamir
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:1249-1281, 2022.

Abstract

We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network’s architecture), whose number of parameters is essentially larger only by a linear factor. In light of previous depth separation theorems, which imply that a similar result cannot hold when the roles of width and depth are interchanged, it follows that depth plays a more significant role than width in the expressive power of neural networks. We extend our results to constructing networks with bounded weights, and to constructing networks with width at most $d+2$, which is close to the minimal possible width due to previous lower bounds. Both of these constructions cause an extra polynomial factor in the number of parameters over the target network. We also show an exact representation of wide and shallow networks using deep and narrow networks which, in certain cases, does not increase the number of parameters over the target network.

Cite this Paper


BibTeX
@InProceedings{pmlr-v178-vardi22a, title = {Width is Less Important than Depth in ReLU Neural Networks}, author = {Vardi, Gal and Yehudai, Gilad and Shamir, Ohad}, booktitle = {Proceedings of Thirty Fifth Conference on Learning Theory}, pages = {1249--1281}, year = {2022}, editor = {Loh, Po-Ling and Raginsky, Maxim}, volume = {178}, series = {Proceedings of Machine Learning Research}, month = {02--05 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v178/vardi22a/vardi22a.pdf}, url = {https://proceedings.mlr.press/v178/vardi22a.html}, abstract = {We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network’s architecture), whose number of parameters is essentially larger only by a linear factor. In light of previous depth separation theorems, which imply that a similar result cannot hold when the roles of width and depth are interchanged, it follows that depth plays a more significant role than width in the expressive power of neural networks. We extend our results to constructing networks with bounded weights, and to constructing networks with width at most $d+2$, which is close to the minimal possible width due to previous lower bounds. Both of these constructions cause an extra polynomial factor in the number of parameters over the target network. We also show an exact representation of wide and shallow networks using deep and narrow networks which, in certain cases, does not increase the number of parameters over the target network.} }
Endnote
%0 Conference Paper %T Width is Less Important than Depth in ReLU Neural Networks %A Gal Vardi %A Gilad Yehudai %A Ohad Shamir %B Proceedings of Thirty Fifth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2022 %E Po-Ling Loh %E Maxim Raginsky %F pmlr-v178-vardi22a %I PMLR %P 1249--1281 %U https://proceedings.mlr.press/v178/vardi22a.html %V 178 %X We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network’s architecture), whose number of parameters is essentially larger only by a linear factor. In light of previous depth separation theorems, which imply that a similar result cannot hold when the roles of width and depth are interchanged, it follows that depth plays a more significant role than width in the expressive power of neural networks. We extend our results to constructing networks with bounded weights, and to constructing networks with width at most $d+2$, which is close to the minimal possible width due to previous lower bounds. Both of these constructions cause an extra polynomial factor in the number of parameters over the target network. We also show an exact representation of wide and shallow networks using deep and narrow networks which, in certain cases, does not increase the number of parameters over the target network.
APA
Vardi, G., Yehudai, G. & Shamir, O.. (2022). Width is Less Important than Depth in ReLU Neural Networks. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:1249-1281 Available from https://proceedings.mlr.press/v178/vardi22a.html.

Related Material