[edit]
Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian Processes
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:1492-1512, 2023.
Abstract
Bayesian optimization is a technique for optimizing black-box target functions. At the core of Bayesian optimization is a surrogate model that predicts the output of the target function at previously unseen inputs to facilitate the selection of promising input values. Gaussian processes (GPs) are commonly used as surrogate models but are known to scale poorly with the number of observations. Inducing point GP approximations can mitigate scaling issues, but may provide overly smooth estimates of the target function. In this work we adapt the Vecchia approximation, a popular GP approximation from spatial statistics, to enable scalable high-dimensional Bayesian optimization. We develop several improvements and extensions to Vecchia, including training warped GPs using mini-batch gradient descent, approximate neighbor search, and variance recalibration. We demonstrate the superior performance of Vecchia in BO using both Thompson sampling and qUCB. On several test functions and on two reinforcement-learning problems, our methods compared favorably to the state of the art, often outperforming inducing point methods and even exact GPs.