benefits of depth in neural networks

Matus Telgarsky
29th Annual Conference on Learning Theory, PMLR 49:1517-1539, 2016.

Abstract

For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required).

Cite this Paper


BibTeX
@InProceedings{pmlr-v49-telgarsky16, title = {benefits of depth in neural networks}, author = {Telgarsky, Matus}, booktitle = {29th Annual Conference on Learning Theory}, pages = {1517--1539}, year = {2016}, editor = {Feldman, Vitaly and Rakhlin, Alexander and Shamir, Ohad}, volume = {49}, series = {Proceedings of Machine Learning Research}, address = {Columbia University, New York, New York, USA}, month = {23--26 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v49/telgarsky16.pdf}, url = {https://proceedings.mlr.press/v49/telgarsky16.html}, abstract = {For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required). } }
Endnote
%0 Conference Paper %T benefits of depth in neural networks %A Matus Telgarsky %B 29th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2016 %E Vitaly Feldman %E Alexander Rakhlin %E Ohad Shamir %F pmlr-v49-telgarsky16 %I PMLR %P 1517--1539 %U https://proceedings.mlr.press/v49/telgarsky16.html %V 49 %X For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required).
RIS
TY - CPAPER TI - benefits of depth in neural networks AU - Matus Telgarsky BT - 29th Annual Conference on Learning Theory DA - 2016/06/06 ED - Vitaly Feldman ED - Alexander Rakhlin ED - Ohad Shamir ID - pmlr-v49-telgarsky16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 49 SP - 1517 EP - 1539 L1 - http://proceedings.mlr.press/v49/telgarsky16.pdf UR - https://proceedings.mlr.press/v49/telgarsky16.html AB - For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required). ER -
APA
Telgarsky, M.. (2016). benefits of depth in neural networks. 29th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 49:1517-1539 Available from https://proceedings.mlr.press/v49/telgarsky16.html.

Related Material