Rates of Convergence for Sparse Variational Gaussian Process Regression

David Burt, Carl Edward Rasmussen, Mark Van Der Wilk
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:862-871, 2019.

Abstract

Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$. A particular case is that for regression with normally distributed inputs in D-dimensions with the Squared Exponential kernel, $M=\mathcal{O}(\log^D N)$ suffices. Our results show that as datasets grow, Gaussian process posteriors can be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-burt19a, title = {Rates of Convergence for Sparse Variational {G}aussian Process Regression}, author = {Burt, David and Rasmussen, Carl Edward and Van Der Wilk, Mark}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {862--871}, year = {2019}, editor = {Kamalika Chaudhuri and Ruslan Salakhutdinov}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/burt19a/burt19a.pdf}, url = { http://proceedings.mlr.press/v97/burt19a.html }, abstract = {Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$. A particular case is that for regression with normally distributed inputs in D-dimensions with the Squared Exponential kernel, $M=\mathcal{O}(\log^D N)$ suffices. Our results show that as datasets grow, Gaussian process posteriors can be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.} }
Endnote
%0 Conference Paper %T Rates of Convergence for Sparse Variational Gaussian Process Regression %A David Burt %A Carl Edward Rasmussen %A Mark Van Der Wilk %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-burt19a %I PMLR %P 862--871 %U http://proceedings.mlr.press/v97/burt19a.html %V 97 %X Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$. A particular case is that for regression with normally distributed inputs in D-dimensions with the Squared Exponential kernel, $M=\mathcal{O}(\log^D N)$ suffices. Our results show that as datasets grow, Gaussian process posteriors can be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.
APA
Burt, D., Rasmussen, C.E. & Van Der Wilk, M.. (2019). Rates of Convergence for Sparse Variational Gaussian Process Regression. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:862-871 Available from http://proceedings.mlr.press/v97/burt19a.html .

Related Material