Deep kernel processes

Laurence Aitchison; Adam Yang; Sebastian W. Ober

Deep kernel processes

Laurence Aitchison, Adam Yang, Sebastian W. Ober

Proceedings of the 38th International Conference on Machine Learning, PMLR 139:130-140, 2021.

Abstract

We define deep kernel processes in which positive definite Gram matrices are progressively transformed by nonlinear kernel functions and by sampling from (inverse) Wishart distributions. Remarkably, we find that deep Gaussian processes (DGPs), Bayesian neural networks (BNNs), infinite BNNs, and infinite BNNs with bottlenecks can all be written as deep kernel processes. For DGPs the equivalence arises because the Gram matrix formed by the inner product of features is Wishart distributed, and as we show, standard isotropic kernels can be written entirely in terms of this Gram matrix — we do not need knowledge of the underlying features. We define a tractable deep kernel process, the deep inverse Wishart process, and give a doubly-stochastic inducing-point variational inference scheme that operates on the Gram matrices, not on the features, as in DGPs. We show that the deep inverse Wishart process gives superior performance to DGPs and infinite BNNs on fully-connected baselines.

Cite this Paper

BibTeX

@InProceedings{pmlr-v139-aitchison21a,
  title = 	 {Deep Kernel Processes},
  author =       {Aitchison, Laurence and Yang, Adam and Ober, Sebastian W},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {130--140},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/aitchison21a/aitchison21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/aitchison21a.html},
  abstract = 	 {We define deep kernel processes in which positive definite Gram matrices are progressively transformed by nonlinear kernel functions and by sampling from (inverse) Wishart distributions. Remarkably, we find that deep Gaussian processes (DGPs), Bayesian neural networks (BNNs), infinite BNNs, and infinite BNNs with bottlenecks can all be written as deep kernel processes. For DGPs the equivalence arises because the Gram matrix formed by the inner product of features is Wishart distributed, and as we show, standard isotropic kernels can be written entirely in terms of this Gram matrix — we do not need knowledge of the underlying features. We define a tractable deep kernel process, the deep inverse Wishart process, and give a doubly-stochastic inducing-point variational inference scheme that operates on the Gram matrices, not on the features, as in DGPs. We show that the deep inverse Wishart process gives superior performance to DGPs and infinite BNNs on fully-connected baselines.}
}

Endnote

%0 Conference Paper
%T Deep kernel processes
%A Laurence Aitchison
%A Adam Yang
%A Sebastian W. Ober
%B Proceedings of the 38th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2021
%E Marina Meila
%E Tong Zhang	
%F pmlr-v139-aitchison21a
%I PMLR
%P 130--140
%U https://proceedings.mlr.press/v139/aitchison21a.html
%V 139
%X We define deep kernel processes in which positive definite Gram matrices are progressively transformed by nonlinear kernel functions and by sampling from (inverse) Wishart distributions. Remarkably, we find that deep Gaussian processes (DGPs), Bayesian neural networks (BNNs), infinite BNNs, and infinite BNNs with bottlenecks can all be written as deep kernel processes. For DGPs the equivalence arises because the Gram matrix formed by the inner product of features is Wishart distributed, and as we show, standard isotropic kernels can be written entirely in terms of this Gram matrix — we do not need knowledge of the underlying features. We define a tractable deep kernel process, the deep inverse Wishart process, and give a doubly-stochastic inducing-point variational inference scheme that operates on the Gram matrices, not on the features, as in DGPs. We show that the deep inverse Wishart process gives superior performance to DGPs and infinite BNNs on fully-connected baselines.

APA

Aitchison, L., Yang, A. & Ober, S.W.. (2021). Deep kernel processes. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:130-140 Available from https://proceedings.mlr.press/v139/aitchison21a.html.

Deep kernel processes

Abstract

Cite this Paper

Related Material