The Curse of Depth in Kernel Regime

Soufiane Hayou, Arnaud Doucet, Judith Rousseau
Proceedings on "I (Still) Can't Believe It's Not Better!" at NeurIPS 2021 Workshops, PMLR 163:41-47, 2022.

Abstract

Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Empirical results in (Lee et al., 2019) demonstrated high performance of a linearized version of training using the so-called NTK regime. In this paper, we show that the large depth limit of this regime is unexpectedly trivial, and we fully characterize the convergence rate to this trivial regime.

Cite this Paper


BibTeX
@InProceedings{pmlr-v163-hayou22a, title = {The Curse of Depth in Kernel Regime}, author = {Hayou, Soufiane and Doucet, Arnaud and Rousseau, Judith}, booktitle = {Proceedings on "I (Still) Can't Believe It's Not Better!" at NeurIPS 2021 Workshops}, pages = {41--47}, year = {2022}, editor = {Pradier, Melanie F. and Schein, Aaron and Hyland, Stephanie and Ruiz, Francisco J. R. and Forde, Jessica Z.}, volume = {163}, series = {Proceedings of Machine Learning Research}, month = {13 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v163/hayou22a/hayou22a.pdf}, url = {https://proceedings.mlr.press/v163/hayou22a.html}, abstract = {Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Empirical results in (Lee et al., 2019) demonstrated high performance of a linearized version of training using the so-called NTK regime. In this paper, we show that the large depth limit of this regime is unexpectedly trivial, and we fully characterize the convergence rate to this trivial regime.} }
Endnote
%0 Conference Paper %T The Curse of Depth in Kernel Regime %A Soufiane Hayou %A Arnaud Doucet %A Judith Rousseau %B Proceedings on "I (Still) Can't Believe It's Not Better!" at NeurIPS 2021 Workshops %C Proceedings of Machine Learning Research %D 2022 %E Melanie F. Pradier %E Aaron Schein %E Stephanie Hyland %E Francisco J. R. Ruiz %E Jessica Z. Forde %F pmlr-v163-hayou22a %I PMLR %P 41--47 %U https://proceedings.mlr.press/v163/hayou22a.html %V 163 %X Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Empirical results in (Lee et al., 2019) demonstrated high performance of a linearized version of training using the so-called NTK regime. In this paper, we show that the large depth limit of this regime is unexpectedly trivial, and we fully characterize the convergence rate to this trivial regime.
APA
Hayou, S., Doucet, A. & Rousseau, J.. (2022). The Curse of Depth in Kernel Regime. Proceedings on "I (Still) Can't Believe It's Not Better!" at NeurIPS 2021 Workshops, in Proceedings of Machine Learning Research 163:41-47 Available from https://proceedings.mlr.press/v163/hayou22a.html.

Related Material