The Connection Between Approximation, Depth Separation and Learnability in Neural Networks

Eran Malach, Gilad Yehudai, Shai Shalev-Schwartz, Ohad Shamir
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:3265-3295, 2021.

Abstract

Several recent works have shown separation results between deep neural networks, and hypothesis classes with inferior approximation capacity such as shallow networks or kernel classes. On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks. In this work we study the intricate connection between learnability and approximation capacity. We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target. Specifically, we show that a necessary condition for a function to be learnable by gradient descent on deep neural networks is to be able to approximate the function, at least in a weak sense, with shallow neural networks. We also show that a class of functions can be learned by an efficient statistical query algorithm if and only if it can be approximated in a weak sense by some kernel class. We give several examples of functions which demonstrate depth separation, and conclude that they cannot be efficiently learned, even by a hypothesis class that can efficiently approximate them.

Cite this Paper


BibTeX
@InProceedings{pmlr-v134-malach21a, title = {The Connection Between Approximation, Depth Separation and Learnability in Neural Networks}, author = {Malach, Eran and Yehudai, Gilad and Shalev-Schwartz, Shai and Shamir, Ohad}, booktitle = {Proceedings of Thirty Fourth Conference on Learning Theory}, pages = {3265--3295}, year = {2021}, editor = {Belkin, Mikhail and Kpotufe, Samory}, volume = {134}, series = {Proceedings of Machine Learning Research}, month = {15--19 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v134/malach21a/malach21a.pdf}, url = {https://proceedings.mlr.press/v134/malach21a.html}, abstract = {Several recent works have shown separation results between deep neural networks, and hypothesis classes with inferior approximation capacity such as shallow networks or kernel classes. On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks. In this work we study the intricate connection between learnability and approximation capacity. We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target. Specifically, we show that a necessary condition for a function to be learnable by gradient descent on deep neural networks is to be able to approximate the function, at least in a weak sense, with shallow neural networks. We also show that a class of functions can be learned by an efficient statistical query algorithm if and only if it can be approximated in a weak sense by some kernel class. We give several examples of functions which demonstrate depth separation, and conclude that they cannot be efficiently learned, even by a hypothesis class that can efficiently approximate them.} }
Endnote
%0 Conference Paper %T The Connection Between Approximation, Depth Separation and Learnability in Neural Networks %A Eran Malach %A Gilad Yehudai %A Shai Shalev-Schwartz %A Ohad Shamir %B Proceedings of Thirty Fourth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Mikhail Belkin %E Samory Kpotufe %F pmlr-v134-malach21a %I PMLR %P 3265--3295 %U https://proceedings.mlr.press/v134/malach21a.html %V 134 %X Several recent works have shown separation results between deep neural networks, and hypothesis classes with inferior approximation capacity such as shallow networks or kernel classes. On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks. In this work we study the intricate connection between learnability and approximation capacity. We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target. Specifically, we show that a necessary condition for a function to be learnable by gradient descent on deep neural networks is to be able to approximate the function, at least in a weak sense, with shallow neural networks. We also show that a class of functions can be learned by an efficient statistical query algorithm if and only if it can be approximated in a weak sense by some kernel class. We give several examples of functions which demonstrate depth separation, and conclude that they cannot be efficiently learned, even by a hypothesis class that can efficiently approximate them.
APA
Malach, E., Yehudai, G., Shalev-Schwartz, S. & Shamir, O.. (2021). The Connection Between Approximation, Depth Separation and Learnability in Neural Networks. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:3265-3295 Available from https://proceedings.mlr.press/v134/malach21a.html.

Related Material