Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

Daniele Calandriello; Luigi Carratino; Alessandro Lazaric; Michal Valko; Lorenzo Rosasco

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:533-557, 2019.

Abstract

Gaussian processes (GP) are a stochastic processes, used as Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least \emph{quadratic} in the number of dimensions $d$ and iterations $t$. Given a set of $A$ alternatives to choose from, the overall runtime $O(t^3 A)$ is prohibitive. In this paper, we introduce BKB (\textit{budgeted kernelized bandit}), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP. We combine a kernelized linear bandit algorithm (GP-UCB) leverage score sampling as a randomized matrix sketching and prove that selecting inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from \emph{variance starvation}, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most $\widetilde{O}(d_{eff})$ points, where $d_{eff}$ is the \emph{effective} dimension of the explored space, which is typically much smaller than both $d$ and $t$. This greatly reduces the dimensionality of the problem, thus leading to a $O(T A d_{eff}^2)$ runtime and $O(A d_{eff})$ space complexity.

Cite this Paper

BibTeX


@InProceedings{pmlr-v99-calandriello19a,
  title = 	 {Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret},
  author =       {Calandriello, Daniele and Carratino, Luigi and Lazaric, Alessandro and Valko, Michal and Rosasco, Lorenzo},
  booktitle = 	 {Proceedings of the Thirty-Second Conference on Learning Theory},
  pages = 	 {533--557},
  year = 	 {2019},
  editor = 	 {Beygelzimer, Alina and Hsu, Daniel},
  volume = 	 {99},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--28 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v99/calandriello19a/calandriello19a.pdf},
  url = 	 {https://proceedings.mlr.press/v99/calandriello19a.html},
  abstract = 	 { Gaussian processes (GP) are a stochastic processes, used  as Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least \emph{quadratic} in the number of dimensions $d$ and iterations $t$. Given a set of $A$ alternatives to choose from, the overall runtime $O(t^3 A)$ is prohibitive.  In this paper, we introduce BKB (\textit{budgeted kernelized bandit}), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP. We combine a kernelized linear bandit algorithm (GP-UCB)  leverage score sampling as a randomized matrix sketching and  prove that selecting inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from \emph{variance starvation}, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most $\widetilde{O}(d_{eff})$ points, where $d_{eff}$ is the \emph{effective} dimension of the explored space, which is typically much smaller than both $d$ and $t$. This greatly reduces the dimensionality of the problem, thus leading to a $O(T A d_{eff}^2)$ runtime and $O(A d_{eff})$ space complexity.}
}

Endnote

%0 Conference Paper
%T Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret
%A Daniele Calandriello
%A Luigi Carratino
%A Alessandro Lazaric
%A Michal Valko
%A Lorenzo Rosasco
%B Proceedings of the Thirty-Second Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2019
%E Alina Beygelzimer
%E Daniel Hsu	
%F pmlr-v99-calandriello19a
%I PMLR
%P 533--557
%U https://proceedings.mlr.press/v99/calandriello19a.html
%V 99
%X  Gaussian processes (GP) are a stochastic processes, used  as Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least \emph{quadratic} in the number of dimensions $d$ and iterations $t$. Given a set of $A$ alternatives to choose from, the overall runtime $O(t^3 A)$ is prohibitive.  In this paper, we introduce BKB (\textit{budgeted kernelized bandit}), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP. We combine a kernelized linear bandit algorithm (GP-UCB)  leverage score sampling as a randomized matrix sketching and  prove that selecting inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from \emph{variance starvation}, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most $\widetilde{O}(d_{eff})$ points, where $d_{eff}$ is the \emph{effective} dimension of the explored space, which is typically much smaller than both $d$ and $t$. This greatly reduces the dimensionality of the problem, thus leading to a $O(T A d_{eff}^2)$ runtime and $O(A d_{eff})$ space complexity.

APA


Calandriello, D., Carratino, L., Lazaric, A., Valko, M. & Rosasco, L.. (2019). Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret. Proceedings of the Thirty-Second Conference on Learning Theory, in Proceedings of Machine Learning Research 99:533-557 Available from https://proceedings.mlr.press/v99/calandriello19a.html.

Related Material

Download PDF