Distributed Adaptive Sampling for Kernel Matrix Approximation

Daniele Calandriello, Alessandro Lazaric, Michal Valko
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:1421-1429, 2017.

Abstract

Most kernel-based methods, such as kernel regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $K_n$ requires at least $O(n^2)$ time and space for $n$ samples. Recent works (Alaoui 2014, Musco 2016) show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of relevant points with strong spectral approximation guarantees for $K_n$. The drawback of RLS-based methods is that computing exact RLS requires constructing and storing the whole kernel matrix. In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that \emphsequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_eff(γ)$ of the dataset. Moreover since all the RLS estimations are efficiently performed using only the small dictionary, SQUEAK never constructs the whole matrix $K_n$, runs in linear time $\widetildeO(nd_eff(γ)^3)$ w.r.t. $n$, and requires only a single pass over the dataset. We also propose a parallel and distributed version of SQUEAK achieving similar accuracy in as little as $\widetildeO(\log(n)d_eff(γ)^3)$ time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v54-calandriello17a, title = {{Distributed Adaptive Sampling for Kernel Matrix Approximation}}, author = {Calandriello, Daniele and Lazaric, Alessandro and Valko, Michal}, booktitle = {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics}, pages = {1421--1429}, year = {2017}, editor = {Singh, Aarti and Zhu, Jerry}, volume = {54}, series = {Proceedings of Machine Learning Research}, month = {20--22 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v54/calandriello17a/calandriello17a.pdf}, url = {https://proceedings.mlr.press/v54/calandriello17a.html}, abstract = {Most kernel-based methods, such as kernel regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $K_n$ requires at least $O(n^2)$ time and space for $n$ samples. Recent works (Alaoui 2014, Musco 2016) show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of relevant points with strong spectral approximation guarantees for $K_n$. The drawback of RLS-based methods is that computing exact RLS requires constructing and storing the whole kernel matrix. In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that \emphsequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_eff(γ)$ of the dataset. Moreover since all the RLS estimations are efficiently performed using only the small dictionary, SQUEAK never constructs the whole matrix $K_n$, runs in linear time $\widetildeO(nd_eff(γ)^3)$ w.r.t. $n$, and requires only a single pass over the dataset. We also propose a parallel and distributed version of SQUEAK achieving similar accuracy in as little as $\widetildeO(\log(n)d_eff(γ)^3)$ time.} }
Endnote
%0 Conference Paper %T Distributed Adaptive Sampling for Kernel Matrix Approximation %A Daniele Calandriello %A Alessandro Lazaric %A Michal Valko %B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2017 %E Aarti Singh %E Jerry Zhu %F pmlr-v54-calandriello17a %I PMLR %P 1421--1429 %U https://proceedings.mlr.press/v54/calandriello17a.html %V 54 %X Most kernel-based methods, such as kernel regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $K_n$ requires at least $O(n^2)$ time and space for $n$ samples. Recent works (Alaoui 2014, Musco 2016) show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of relevant points with strong spectral approximation guarantees for $K_n$. The drawback of RLS-based methods is that computing exact RLS requires constructing and storing the whole kernel matrix. In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that \emphsequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_eff(γ)$ of the dataset. Moreover since all the RLS estimations are efficiently performed using only the small dictionary, SQUEAK never constructs the whole matrix $K_n$, runs in linear time $\widetildeO(nd_eff(γ)^3)$ w.r.t. $n$, and requires only a single pass over the dataset. We also propose a parallel and distributed version of SQUEAK achieving similar accuracy in as little as $\widetildeO(\log(n)d_eff(γ)^3)$ time.
APA
Calandriello, D., Lazaric, A. & Valko, M.. (2017). Distributed Adaptive Sampling for Kernel Matrix Approximation. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 54:1421-1429 Available from https://proceedings.mlr.press/v54/calandriello17a.html.

Related Material