Gradient Descent in Neural Networks as Sequential Learning in Reproducing Kernel Banach Space

Alistair Shilton, Sunil Gupta, Santu Rana, Svetha Venkatesh
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:31435-31488, 2023.

Abstract

The study of Neural Tangent Kernels (NTKs) has provided much needed insight into convergence and generalization properties of neural networks in the over-parametrized (wide) limit by approximating the network using a first-order Taylor expansion with respect to its weights in the neighborhood of their initialization values. This allows neural network training to be analyzed from the perspective of reproducing kernel Hilbert spaces (RKHS), which is informative in the over-parametrized regime, but a poor approximation for narrower networks as the weights change more during training. Our goal is to extend beyond the limits of NTK toward a more general theory. We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights as an inner product of two feature maps, respectively from data and weight-step space, to feature space, allowing neural network training to be analyzed from the perspective of reproducing kernel Banach space (RKBS). We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning in RKBS. Using this, we present novel bound on uniform convergence where the iterations count and learning rate play a central role, giving new theoretical insight into neural network training.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-shilton23a, title = {Gradient Descent in Neural Networks as Sequential Learning in Reproducing Kernel Banach Space}, author = {Shilton, Alistair and Gupta, Sunil and Rana, Santu and Venkatesh, Svetha}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {31435--31488}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/shilton23a/shilton23a.pdf}, url = {https://proceedings.mlr.press/v202/shilton23a.html}, abstract = {The study of Neural Tangent Kernels (NTKs) has provided much needed insight into convergence and generalization properties of neural networks in the over-parametrized (wide) limit by approximating the network using a first-order Taylor expansion with respect to its weights in the neighborhood of their initialization values. This allows neural network training to be analyzed from the perspective of reproducing kernel Hilbert spaces (RKHS), which is informative in the over-parametrized regime, but a poor approximation for narrower networks as the weights change more during training. Our goal is to extend beyond the limits of NTK toward a more general theory. We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights as an inner product of two feature maps, respectively from data and weight-step space, to feature space, allowing neural network training to be analyzed from the perspective of reproducing kernel Banach space (RKBS). We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning in RKBS. Using this, we present novel bound on uniform convergence where the iterations count and learning rate play a central role, giving new theoretical insight into neural network training.} }
Endnote
%0 Conference Paper %T Gradient Descent in Neural Networks as Sequential Learning in Reproducing Kernel Banach Space %A Alistair Shilton %A Sunil Gupta %A Santu Rana %A Svetha Venkatesh %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-shilton23a %I PMLR %P 31435--31488 %U https://proceedings.mlr.press/v202/shilton23a.html %V 202 %X The study of Neural Tangent Kernels (NTKs) has provided much needed insight into convergence and generalization properties of neural networks in the over-parametrized (wide) limit by approximating the network using a first-order Taylor expansion with respect to its weights in the neighborhood of their initialization values. This allows neural network training to be analyzed from the perspective of reproducing kernel Hilbert spaces (RKHS), which is informative in the over-parametrized regime, but a poor approximation for narrower networks as the weights change more during training. Our goal is to extend beyond the limits of NTK toward a more general theory. We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights as an inner product of two feature maps, respectively from data and weight-step space, to feature space, allowing neural network training to be analyzed from the perspective of reproducing kernel Banach space (RKBS). We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning in RKBS. Using this, we present novel bound on uniform convergence where the iterations count and learning rate play a central role, giving new theoretical insight into neural network training.
APA
Shilton, A., Gupta, S., Rana, S. & Venkatesh, S.. (2023). Gradient Descent in Neural Networks as Sequential Learning in Reproducing Kernel Banach Space. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:31435-31488 Available from https://proceedings.mlr.press/v202/shilton23a.html.

Related Material