Disentangling Trainability and Generalization in Deep Neural Networks

Lechao Xiao, Jeffrey Pennington, Samuel Schoenholz
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:10462-10472, 2020.

Abstract

A longstanding goal in the theory of deep learning is to characterize the conditions under which a given neural network architecture will be trainable, and if so, how well it might generalize to unseen data. In this work, we provide such a characterization in the limit of very wide and very deep networks, for which the analysis simplifies considerably. For wide networks, the trajectory under gradient descent is governed by the Neural Tangent Kernel (NTK), and for deep networks the NTK itself maintains only weak data dependence. By analyzing the spectrum of the NTK, we formulate necessary conditions for trainability and generalization across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs). We identify large regions of hyperparameter space for which networks can memorize the training set but completely fail to generalize. We find that CNNs without global average pooling behave almost identically to FCNs, but that CNNs with pooling have markedly different and often better generalization performance. These theoretical results are corroborated experimentally on CIFAR10 for a variety of network architectures. We include a \href{https://colab.research.google.com/github/google/neural-tangents/blob/master/notebooks/disentangling_trainability_and_generalization.ipynb}{colab} notebook that reproduces the essential results of the paper.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-xiao20b, title = {Disentangling Trainability and Generalization in Deep Neural Networks}, author = {Xiao, Lechao and Pennington, Jeffrey and Schoenholz, Samuel}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {10462--10472}, year = {2020}, editor = {Hal Daumé III and Aarti Singh}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/xiao20b/xiao20b.pdf}, url = { http://proceedings.mlr.press/v119/xiao20b.html }, abstract = {A longstanding goal in the theory of deep learning is to characterize the conditions under which a given neural network architecture will be trainable, and if so, how well it might generalize to unseen data. In this work, we provide such a characterization in the limit of very wide and very deep networks, for which the analysis simplifies considerably. For wide networks, the trajectory under gradient descent is governed by the Neural Tangent Kernel (NTK), and for deep networks the NTK itself maintains only weak data dependence. By analyzing the spectrum of the NTK, we formulate necessary conditions for trainability and generalization across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs). We identify large regions of hyperparameter space for which networks can memorize the training set but completely fail to generalize. We find that CNNs without global average pooling behave almost identically to FCNs, but that CNNs with pooling have markedly different and often better generalization performance. These theoretical results are corroborated experimentally on CIFAR10 for a variety of network architectures. We include a \href{https://colab.research.google.com/github/google/neural-tangents/blob/master/notebooks/disentangling_trainability_and_generalization.ipynb}{colab} notebook that reproduces the essential results of the paper.} }
Endnote
%0 Conference Paper %T Disentangling Trainability and Generalization in Deep Neural Networks %A Lechao Xiao %A Jeffrey Pennington %A Samuel Schoenholz %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-xiao20b %I PMLR %P 10462--10472 %U http://proceedings.mlr.press/v119/xiao20b.html %V 119 %X A longstanding goal in the theory of deep learning is to characterize the conditions under which a given neural network architecture will be trainable, and if so, how well it might generalize to unseen data. In this work, we provide such a characterization in the limit of very wide and very deep networks, for which the analysis simplifies considerably. For wide networks, the trajectory under gradient descent is governed by the Neural Tangent Kernel (NTK), and for deep networks the NTK itself maintains only weak data dependence. By analyzing the spectrum of the NTK, we formulate necessary conditions for trainability and generalization across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs). We identify large regions of hyperparameter space for which networks can memorize the training set but completely fail to generalize. We find that CNNs without global average pooling behave almost identically to FCNs, but that CNNs with pooling have markedly different and often better generalization performance. These theoretical results are corroborated experimentally on CIFAR10 for a variety of network architectures. We include a \href{https://colab.research.google.com/github/google/neural-tangents/blob/master/notebooks/disentangling_trainability_and_generalization.ipynb}{colab} notebook that reproduces the essential results of the paper.
APA
Xiao, L., Pennington, J. & Schoenholz, S.. (2020). Disentangling Trainability and Generalization in Deep Neural Networks. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:10462-10472 Available from http://proceedings.mlr.press/v119/xiao20b.html .

Related Material