A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel

Mohamad Amin Mohamadi; Wonho Bae; Danica J. Sutherland

A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel

Mohamad Amin Mohamadi, Wonho Bae, Danica J. Sutherland

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:25061-25081, 2023.

Abstract

Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network’s representation: they are often far less expensive to compute and applicable more broadly than infinite-width NTKs. For networks with

$O$ output units (e.g. an

$O$ -class classifier), however, the eNTK on

$N$ inputs is of size

$NO \times NO$ , taking

$\mathcal O\big( (N O)^2\big)$ memory and up to

$\mathcal O\big( (N O)^3 \big)$ computation to use. Most existing applications have therefore used one of a handful of approximations yielding

$N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call "sum of logits," converges to the true eNTK at initialization. Our experiments demonstrate the quality of this approximation for various uses across a range of settings.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-mohamadi23a,
  title = 	 {A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel},
  author =       {Mohamadi, Mohamad Amin and Bae, Wonho and Sutherland, Danica J.},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {25061--25081},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/mohamadi23a/mohamadi23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/mohamadi23a.html},
  abstract = 	 {Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network’s representation: they are often far less expensive to compute and applicable more broadly than infinite-width NTKs. For networks with $O$ output units (e.g. an $O$-class classifier), however, the eNTK on $N$ inputs is of size $NO \times NO$, taking $\mathcal O\big( (N O)^2\big)$ memory and up to $\mathcal O\big( (N O)^3 \big)$ computation to use. Most existing applications have therefore used one of a handful of approximations yielding $N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call "sum of logits," converges to the true eNTK at initialization. Our experiments demonstrate the quality of this approximation for various uses across a range of settings.}
}

Endnote

%0 Conference Paper
%T A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel
%A Mohamad Amin Mohamadi
%A Wonho Bae
%A Danica J. Sutherland
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-mohamadi23a
%I PMLR
%P 25061--25081
%U https://proceedings.mlr.press/v202/mohamadi23a.html
%V 202
%X Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network’s representation: they are often far less expensive to compute and applicable more broadly than infinite-width NTKs. For networks with $O$ output units (e.g. an $O$-class classifier), however, the eNTK on $N$ inputs is of size $NO \times NO$, taking $\mathcal O\big( (N O)^2\big)$ memory and up to $\mathcal O\big( (N O)^3 \big)$ computation to use. Most existing applications have therefore used one of a handful of approximations yielding $N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call "sum of logits," converges to the true eNTK at initialization. Our experiments demonstrate the quality of this approximation for various uses across a range of settings.

APA


Mohamadi, M.A., Bae, W. & Sutherland, D.J.. (2023). A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:25061-25081 Available from https://proceedings.mlr.press/v202/mohamadi23a.html.

A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel

Abstract

Cite this Paper

Related Material