benefits of depth in neural networks

Matus Telgarsky
; 29th Annual Conference on Learning Theory, PMLR 49:1517-1539, 2016.

Abstract

For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required).

Cite this Paper


BibTeX
@InProceedings{pmlr-v49-telgarsky16, title = {benefits of depth in neural networks}, author = {Matus Telgarsky}, booktitle = {29th Annual Conference on Learning Theory}, pages = {1517--1539}, year = {2016}, editor = {Vitaly Feldman and Alexander Rakhlin and Ohad Shamir}, volume = {49}, series = {Proceedings of Machine Learning Research}, address = {Columbia University, New York, New York, USA}, month = {23--26 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v49/telgarsky16.pdf}, url = {http://proceedings.mlr.press/v49/telgarsky16.html}, abstract = {For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required). } }
Endnote
%0 Conference Paper %T benefits of depth in neural networks %A Matus Telgarsky %B 29th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2016 %E Vitaly Feldman %E Alexander Rakhlin %E Ohad Shamir %F pmlr-v49-telgarsky16 %I PMLR %J Proceedings of Machine Learning Research %P 1517--1539 %U http://proceedings.mlr.press %V 49 %W PMLR %X For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required).
RIS
TY - CPAPER TI - benefits of depth in neural networks AU - Matus Telgarsky BT - 29th Annual Conference on Learning Theory PY - 2016/06/06 DA - 2016/06/06 ED - Vitaly Feldman ED - Alexander Rakhlin ED - Ohad Shamir ID - pmlr-v49-telgarsky16 PB - PMLR SP - 1517 DP - PMLR EP - 1539 L1 - http://proceedings.mlr.press/v49/telgarsky16.pdf UR - http://proceedings.mlr.press/v49/telgarsky16.html AB - For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required). ER -
APA
Telgarsky, M.. (2016). benefits of depth in neural networks. 29th Annual Conference on Learning Theory, in PMLR 49:1517-1539

Related Material