Neural tangent kernel at initialization: linear width suffices

Arindam Banerjee, Pedro Cisneros-Velarde, Libin Zhu, Mikhail Belkin
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:110-118, 2023.

Abstract

In this paper we study the problem of lower bounding the minimum eigenvalue of the neural tangent kernel (NTK) at initialization, an important quantity for the theoretical analysis of training in neural networks. We consider feedforward neural networks with smooth activation functions. Without any distributional assumptions on the input, we present a novel result: we show that for suitable initialization variance, $\widetilde{\Omega}(n)$ width, where $n$ is the number of training samples, suffices to ensure that the NTK at initialization is positive definite, improving prior results for smooth activations under our setting. Prior to our work, the sufficiency of linear width has only been shown either for networks with ReLU activation functions, and sublinear width has been shown for smooth networks but with additional conditions on the distribution of the data. The technical challenge in the analysis stems from the layerwise inhomogeneity of smooth activation functions and we handle the challenge using {\em generalized} Hermite series expansion of such activations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-banerjee23a, title = {Neural tangent kernel at initialization: linear width suffices}, author = {Banerjee, Arindam and Cisneros-Velarde, Pedro and Zhu, Libin and Belkin, Mikhail}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {110--118}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/banerjee23a/banerjee23a.pdf}, url = {https://proceedings.mlr.press/v216/banerjee23a.html}, abstract = {In this paper we study the problem of lower bounding the minimum eigenvalue of the neural tangent kernel (NTK) at initialization, an important quantity for the theoretical analysis of training in neural networks. We consider feedforward neural networks with smooth activation functions. Without any distributional assumptions on the input, we present a novel result: we show that for suitable initialization variance, $\widetilde{\Omega}(n)$ width, where $n$ is the number of training samples, suffices to ensure that the NTK at initialization is positive definite, improving prior results for smooth activations under our setting. Prior to our work, the sufficiency of linear width has only been shown either for networks with ReLU activation functions, and sublinear width has been shown for smooth networks but with additional conditions on the distribution of the data. The technical challenge in the analysis stems from the layerwise inhomogeneity of smooth activation functions and we handle the challenge using {\em generalized} Hermite series expansion of such activations.} }
Endnote
%0 Conference Paper %T Neural tangent kernel at initialization: linear width suffices %A Arindam Banerjee %A Pedro Cisneros-Velarde %A Libin Zhu %A Mikhail Belkin %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-banerjee23a %I PMLR %P 110--118 %U https://proceedings.mlr.press/v216/banerjee23a.html %V 216 %X In this paper we study the problem of lower bounding the minimum eigenvalue of the neural tangent kernel (NTK) at initialization, an important quantity for the theoretical analysis of training in neural networks. We consider feedforward neural networks with smooth activation functions. Without any distributional assumptions on the input, we present a novel result: we show that for suitable initialization variance, $\widetilde{\Omega}(n)$ width, where $n$ is the number of training samples, suffices to ensure that the NTK at initialization is positive definite, improving prior results for smooth activations under our setting. Prior to our work, the sufficiency of linear width has only been shown either for networks with ReLU activation functions, and sublinear width has been shown for smooth networks but with additional conditions on the distribution of the data. The technical challenge in the analysis stems from the layerwise inhomogeneity of smooth activation functions and we handle the challenge using {\em generalized} Hermite series expansion of such activations.
APA
Banerjee, A., Cisneros-Velarde, P., Zhu, L. & Belkin, M.. (2023). Neural tangent kernel at initialization: linear width suffices. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:110-118 Available from https://proceedings.mlr.press/v216/banerjee23a.html.

Related Material