[edit]
The Curse of Depth in Kernel Regime
Proceedings on "I (Still) Can't Believe It's Not Better!" at NeurIPS 2021 Workshops, PMLR 163:41-47, 2022.
Abstract
Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Empirical results in (Lee et al., 2019) demonstrated high performance of a linearized version of training using the so-called NTK regime. In this paper, we show that the large depth limit of this regime is unexpectedly trivial, and we fully characterize the convergence rate to this trivial regime.