A Deep Conditioning Treatment of Neural Networks

Naman Agarwal, Pranjal Awasthi, Satyen Kale
Proceedings of the 32nd International Conference on Algorithmic Learning Theory, PMLR 132:249-305, 2021.

Abstract

We study the role of depth in training randomly initialized overparameterized neural networks. We give a general result showing that depth improves trainability of neural networks by improving the conditioning of certain kernel matrices of the input data. This result holds for arbitrary non-linear activation functions under a certain normalization. We provide versions of the result that hold for training just the top layer of the neural network, as well as for training all layers, via the neural tangent kernel. As applications of these general results, we provide a generalization of the results of Das et al. (2019) showing that learnability of deep random neural networks with a large class of non-linear activations degrades exponentially with depth. We also show how benign overfitting can occur in deep neural networks via the results of Bartlett et al. (2019b). We also give experimental evidence that normalized versions of ReLU are a viable alternative to more complex operations like Batch Normalization in training deep neural networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v132-agarwal21b, title = {A Deep Conditioning Treatment of Neural Networks}, author = {Agarwal, Naman and Awasthi, Pranjal and Kale, Satyen}, booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory}, pages = {249--305}, year = {2021}, editor = {Vitaly Feldman and Katrina Ligett and Sivan Sabato}, volume = {132}, series = {Proceedings of Machine Learning Research}, month = {16--19 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v132/agarwal21b/agarwal21b.pdf}, url = { http://proceedings.mlr.press/v132/agarwal21b.html }, abstract = {We study the role of depth in training randomly initialized overparameterized neural networks. We give a general result showing that depth improves trainability of neural networks by improving the conditioning of certain kernel matrices of the input data. This result holds for arbitrary non-linear activation functions under a certain normalization. We provide versions of the result that hold for training just the top layer of the neural network, as well as for training all layers, via the neural tangent kernel. As applications of these general results, we provide a generalization of the results of Das et al. (2019) showing that learnability of deep random neural networks with a large class of non-linear activations degrades exponentially with depth. We also show how benign overfitting can occur in deep neural networks via the results of Bartlett et al. (2019b). We also give experimental evidence that normalized versions of ReLU are a viable alternative to more complex operations like Batch Normalization in training deep neural networks.} }
Endnote
%0 Conference Paper %T A Deep Conditioning Treatment of Neural Networks %A Naman Agarwal %A Pranjal Awasthi %A Satyen Kale %B Proceedings of the 32nd International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2021 %E Vitaly Feldman %E Katrina Ligett %E Sivan Sabato %F pmlr-v132-agarwal21b %I PMLR %P 249--305 %U http://proceedings.mlr.press/v132/agarwal21b.html %V 132 %X We study the role of depth in training randomly initialized overparameterized neural networks. We give a general result showing that depth improves trainability of neural networks by improving the conditioning of certain kernel matrices of the input data. This result holds for arbitrary non-linear activation functions under a certain normalization. We provide versions of the result that hold for training just the top layer of the neural network, as well as for training all layers, via the neural tangent kernel. As applications of these general results, we provide a generalization of the results of Das et al. (2019) showing that learnability of deep random neural networks with a large class of non-linear activations degrades exponentially with depth. We also show how benign overfitting can occur in deep neural networks via the results of Bartlett et al. (2019b). We also give experimental evidence that normalized versions of ReLU are a viable alternative to more complex operations like Batch Normalization in training deep neural networks.
APA
Agarwal, N., Awasthi, P. & Kale, S.. (2021). A Deep Conditioning Treatment of Neural Networks. Proceedings of the 32nd International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 132:249-305 Available from http://proceedings.mlr.press/v132/agarwal21b.html .

Related Material