Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels

Alexander Immer, Tycho F. A. Van Der Ouderaa, Mark Van Der Wilk, Gunnar Ratsch, Bernhard Schölkopf
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:14333-14352, 2023.

Abstract

Selecting hyperparameters in deep learning greatly impacts its effectiveness but requires manual effort and expertise. Recent works show that Bayesian model selection with Laplace approximations can allow to optimize such hyperparameters just like standard neural network parameters using gradients and on the training data. However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood. In contrast to previous estimators, these bounds are amenable to stochastic-gradient-based optimization and allow to trade off estimation accuracy against computational complexity. We derive them using the function-space form of the linearized Laplace, which can be estimated using the neural tangent kernel. Experimentally, we show that the estimators can significantly accelerate gradient-based hyperparameter optimization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-immer23b, title = {Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels}, author = {Immer, Alexander and Van Der Ouderaa, Tycho F. A. and Van Der Wilk, Mark and Ratsch, Gunnar and Sch\"{o}lkopf, Bernhard}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {14333--14352}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/immer23b/immer23b.pdf}, url = {https://proceedings.mlr.press/v202/immer23b.html}, abstract = {Selecting hyperparameters in deep learning greatly impacts its effectiveness but requires manual effort and expertise. Recent works show that Bayesian model selection with Laplace approximations can allow to optimize such hyperparameters just like standard neural network parameters using gradients and on the training data. However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood. In contrast to previous estimators, these bounds are amenable to stochastic-gradient-based optimization and allow to trade off estimation accuracy against computational complexity. We derive them using the function-space form of the linearized Laplace, which can be estimated using the neural tangent kernel. Experimentally, we show that the estimators can significantly accelerate gradient-based hyperparameter optimization.} }
Endnote
%0 Conference Paper %T Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels %A Alexander Immer %A Tycho F. A. Van Der Ouderaa %A Mark Van Der Wilk %A Gunnar Ratsch %A Bernhard Schölkopf %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-immer23b %I PMLR %P 14333--14352 %U https://proceedings.mlr.press/v202/immer23b.html %V 202 %X Selecting hyperparameters in deep learning greatly impacts its effectiveness but requires manual effort and expertise. Recent works show that Bayesian model selection with Laplace approximations can allow to optimize such hyperparameters just like standard neural network parameters using gradients and on the training data. However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood. In contrast to previous estimators, these bounds are amenable to stochastic-gradient-based optimization and allow to trade off estimation accuracy against computational complexity. We derive them using the function-space form of the linearized Laplace, which can be estimated using the neural tangent kernel. Experimentally, we show that the estimators can significantly accelerate gradient-based hyperparameter optimization.
APA
Immer, A., Van Der Ouderaa, T.F.A., Van Der Wilk, M., Ratsch, G. & Schölkopf, B.. (2023). Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:14333-14352 Available from https://proceedings.mlr.press/v202/immer23b.html.

Related Material