Preconditioning for Scalable Gaussian Process Hyperparameter Optimization

Jonathan Wenger, Geoff Pleiss, Philipp Hennig, John Cunningham, Jacob Gardner
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:23751-23780, 2022.

Abstract

Gaussian process hyperparameter optimization requires linear solves with, and log-determinants of, large kernel matrices. Iterative numerical techniques are becoming popular to scale to larger datasets, relying on the conjugate gradient method (CG) for the linear solves and stochastic trace estimation for the log-determinant. This work introduces new algorithmic and theoretical insights for preconditioning these computations. While preconditioning is well understood in the context of CG, we demonstrate that it can also accelerate convergence and reduce variance of the estimates for the log-determinant and its derivative. We prove general probabilistic error bounds for the preconditioned computation of the log-determinant, log-marginal likelihood and its derivatives. Additionally, we derive specific rates for a range of kernel-preconditioner combinations, showing that up to exponential convergence can be achieved. Our theoretical results enable provably efficient optimization of kernel hyperparameters, which we validate empirically on large-scale benchmark problems. There our approach accelerates training by up to an order of magnitude.

Cite this Paper


BibTeX
@InProceedings{pmlr-v162-wenger22a, title = {Preconditioning for Scalable {G}aussian Process Hyperparameter Optimization}, author = {Wenger, Jonathan and Pleiss, Geoff and Hennig, Philipp and Cunningham, John and Gardner, Jacob}, booktitle = {Proceedings of the 39th International Conference on Machine Learning}, pages = {23751--23780}, year = {2022}, editor = {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan}, volume = {162}, series = {Proceedings of Machine Learning Research}, month = {17--23 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v162/wenger22a/wenger22a.pdf}, url = {https://proceedings.mlr.press/v162/wenger22a.html}, abstract = {Gaussian process hyperparameter optimization requires linear solves with, and log-determinants of, large kernel matrices. Iterative numerical techniques are becoming popular to scale to larger datasets, relying on the conjugate gradient method (CG) for the linear solves and stochastic trace estimation for the log-determinant. This work introduces new algorithmic and theoretical insights for preconditioning these computations. While preconditioning is well understood in the context of CG, we demonstrate that it can also accelerate convergence and reduce variance of the estimates for the log-determinant and its derivative. We prove general probabilistic error bounds for the preconditioned computation of the log-determinant, log-marginal likelihood and its derivatives. Additionally, we derive specific rates for a range of kernel-preconditioner combinations, showing that up to exponential convergence can be achieved. Our theoretical results enable provably efficient optimization of kernel hyperparameters, which we validate empirically on large-scale benchmark problems. There our approach accelerates training by up to an order of magnitude.} }
Endnote
%0 Conference Paper %T Preconditioning for Scalable Gaussian Process Hyperparameter Optimization %A Jonathan Wenger %A Geoff Pleiss %A Philipp Hennig %A John Cunningham %A Jacob Gardner %B Proceedings of the 39th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2022 %E Kamalika Chaudhuri %E Stefanie Jegelka %E Le Song %E Csaba Szepesvari %E Gang Niu %E Sivan Sabato %F pmlr-v162-wenger22a %I PMLR %P 23751--23780 %U https://proceedings.mlr.press/v162/wenger22a.html %V 162 %X Gaussian process hyperparameter optimization requires linear solves with, and log-determinants of, large kernel matrices. Iterative numerical techniques are becoming popular to scale to larger datasets, relying on the conjugate gradient method (CG) for the linear solves and stochastic trace estimation for the log-determinant. This work introduces new algorithmic and theoretical insights for preconditioning these computations. While preconditioning is well understood in the context of CG, we demonstrate that it can also accelerate convergence and reduce variance of the estimates for the log-determinant and its derivative. We prove general probabilistic error bounds for the preconditioned computation of the log-determinant, log-marginal likelihood and its derivatives. Additionally, we derive specific rates for a range of kernel-preconditioner combinations, showing that up to exponential convergence can be achieved. Our theoretical results enable provably efficient optimization of kernel hyperparameters, which we validate empirically on large-scale benchmark problems. There our approach accelerates training by up to an order of magnitude.
APA
Wenger, J., Pleiss, G., Hennig, P., Cunningham, J. & Gardner, J.. (2022). Preconditioning for Scalable Gaussian Process Hyperparameter Optimization. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:23751-23780 Available from https://proceedings.mlr.press/v162/wenger22a.html.

Related Material