benefits of depth in neural networks

Matus Telgarsky

benefits of depth in neural networks

Matus Telgarsky

29th Annual Conference on Learning Theory, PMLR 49:1517-1539, 2016.

Abstract

For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required).

Cite this Paper

BibTeX


@InProceedings{pmlr-v49-telgarsky16,
  title = 	 {benefits of depth in neural networks},
  author = 	 {Telgarsky, Matus},
  booktitle = 	 {29th Annual Conference on Learning Theory},
  pages = 	 {1517--1539},
  year = 	 {2016},
  editor = 	 {Feldman, Vitaly and Rakhlin, Alexander and Shamir, Ohad},
  volume = 	 {49},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Columbia University, New York, New York, USA},
  month = 	 {23--26 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v49/telgarsky16.pdf},
  url = 	 {https://proceedings.mlr.press/v49/telgarsky16.html},
  abstract = 	 {For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required). }
}

Endnote

%0 Conference Paper
%T benefits of depth in neural networks
%A Matus Telgarsky
%B 29th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2016
%E Vitaly Feldman
%E Alexander Rakhlin
%E Ohad Shamir	
%F pmlr-v49-telgarsky16
%I PMLR
%P 1517--1539
%U https://proceedings.mlr.press/v49/telgarsky16.html
%V 49
%X For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required).

RIS


TY  - CPAPER
TI  - benefits of depth in neural networks
AU  - Matus Telgarsky
BT  - 29th Annual Conference on Learning Theory
DA  - 2016/06/06
ED  - Vitaly Feldman
ED  - Alexander Rakhlin
ED  - Ohad Shamir	
ID  - pmlr-v49-telgarsky16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 49
SP  - 1517
EP  - 1539
L1  - http://proceedings.mlr.press/v49/telgarsky16.pdf
UR  - https://proceedings.mlr.press/v49/telgarsky16.html
AB  - For any positive integer k, there exist neural networks with Θ(k^3) layers, Θ(1) nodes per layer, and Θ(1) distinct parameters which can not be approximated by networks with O(k) layers unless they are exponentially large — they must possess Ω(2^k) nodes. This result is proved here for a class of nodes termed \emphsemi-algebraic gates which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: Ω(2^k^3) total tree nodes are required). 
ER  -

APA


Telgarsky, M.. (2016). benefits of depth in neural networks. 29th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 49:1517-1539 Available from https://proceedings.mlr.press/v49/telgarsky16.html.

Related Material

Download PDF