Rates of Convergence for Sparse Variational Gaussian Process Regression

David Burt; Carl Edward Rasmussen; Mark Van Der Wilk

Rates of Convergence for Sparse Variational Gaussian Process Regression

David Burt, Carl Edward Rasmussen, Mark Van Der Wilk

Proceedings of the 36th International Conference on Machine Learning, PMLR 97:862-871, 2019.

Abstract

Excellent variational approximations to Gaussian process posteriors have been developed which avoid the

$\mathcal{O}\left(N^3\right)$ scaling with dataset size

$N$ . They reduce the computational cost to

$\mathcal{O}\left(NM^2\right)$ , with

$M\ll N$ the number of inducing variables, which summarise the process. While the computational cost seems to be linear in

$N$ , the true complexity of the algorithm depends on how

$M$ must increase to ensure a certain quality of approximation. We show that with high probability the KL divergence can be made arbitrarily small by growing

$M$ more slowly than

$N$ . A particular case is that for regression with normally distributed inputs in D-dimensions with the Squared Exponential kernel,

$M=\mathcal{O}(\log^D N)$ suffices. Our results show that as datasets grow, Gaussian process posteriors can be approximated cheaply, and provide a concrete rule for how to increase

$M$ in continual learning scenarios.

Cite this Paper

BibTeX


@InProceedings{pmlr-v97-burt19a,
  title = 	 {Rates of Convergence for Sparse Variational {G}aussian Process Regression},
  author =       {Burt, David and Rasmussen, Carl Edward and Van Der Wilk, Mark},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {862--871},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--15 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/burt19a/burt19a.pdf},
  url = 	 {https://proceedings.mlr.press/v97/burt19a.html},
  abstract = 	 {Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$. A particular case is that for regression with normally distributed inputs in D-dimensions with the Squared Exponential kernel, $M=\mathcal{O}(\log^D N)$ suffices. Our results show that as datasets grow, Gaussian process posteriors can be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.}
}

Endnote

%0 Conference Paper
%T Rates of Convergence for Sparse Variational Gaussian Process Regression
%A David Burt
%A Carl Edward Rasmussen
%A Mark Van Der Wilk
%B Proceedings of the 36th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2019
%E Kamalika Chaudhuri
%E Ruslan Salakhutdinov	
%F pmlr-v97-burt19a
%I PMLR
%P 862--871
%U https://proceedings.mlr.press/v97/burt19a.html
%V 97
%X Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$. A particular case is that for regression with normally distributed inputs in D-dimensions with the Squared Exponential kernel, $M=\mathcal{O}(\log^D N)$ suffices. Our results show that as datasets grow, Gaussian process posteriors can be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.

APA


Burt, D., Rasmussen, C.E. & Van Der Wilk, M.. (2019). Rates of Convergence for Sparse Variational Gaussian Process Regression. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:862-871 Available from https://proceedings.mlr.press/v97/burt19a.html.

Rates of Convergence for Sparse Variational Gaussian Process Regression

Abstract

Cite this Paper

Related Material