Deep Kernel Learning

Andrew Gordon Wilson, Zhiting Hu, Ruslan Salakhutdinov, Eric P. Xing
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:370-378, 2016.

Abstract

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost O(n) for n training points, and predictions cost O(1) per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-wilson16, title = {Deep Kernel Learning}, author = {Wilson, Andrew Gordon and Hu, Zhiting and Salakhutdinov, Ruslan and Xing, Eric P.}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {370--378}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/wilson16.pdf}, url = {https://proceedings.mlr.press/v51/wilson16.html}, abstract = {We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost O(n) for n training points, and predictions cost O(1) per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.} }
Endnote
%0 Conference Paper %T Deep Kernel Learning %A Andrew Gordon Wilson %A Zhiting Hu %A Ruslan Salakhutdinov %A Eric P. Xing %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-wilson16 %I PMLR %P 370--378 %U https://proceedings.mlr.press/v51/wilson16.html %V 51 %X We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost O(n) for n training points, and predictions cost O(1) per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.
RIS
TY - CPAPER TI - Deep Kernel Learning AU - Andrew Gordon Wilson AU - Zhiting Hu AU - Ruslan Salakhutdinov AU - Eric P. Xing BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-wilson16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 370 EP - 378 L1 - http://proceedings.mlr.press/v51/wilson16.pdf UR - https://proceedings.mlr.press/v51/wilson16.html AB - We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost O(n) for n training points, and predictions cost O(1) per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures. ER -
APA
Wilson, A.G., Hu, Z., Salakhutdinov, R. & Xing, E.P.. (2016). Deep Kernel Learning. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:370-378 Available from https://proceedings.mlr.press/v51/wilson16.html.

Related Material