Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel $k$means Clustering
[edit]
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:70557063, 2019.
Abstract
Kernel methods generalize machine learning algorithms that only depend on the pairwise inner products of the dataset by replacing inner products with kernel evaluations, a function that passes input points through a nonlinear feature map before taking the inner product in a higher dimensional space. In this work, we present nearly tight lower bounds on the number of kernel evaluations required to approximately solve kernel ridge regression (KRR) and kernel $k$means clustering (KKMC) on $n$ input points. For KRR, our bound for relative error approximation the argmin of the objective function is $\Omega(nd_{\mathrm{eff}}^\lambda/\varepsilon)$ where $d_{\mathrm{eff}}^\lambda$ is the effective statistical dimension, tight up to a $\log(d_{\mathrm{eff}}^\lambda/\varepsilon)$ factor. For KKMC, our bound for finding a $k$clustering achieving a relative error approximation of the objective function is $\Omega(nk/\varepsilon)$, tight up to a $\log(k/\varepsilon)$ factor. Our KRR result resolves a variant of an open question of El Alaoui and Mahoney, asking whether the effective statistical dimension is a lower bound on the sampling complexity or not. Furthermore, for the important input distribution case of mixtures of Gaussians, we provide algorithms that bypass the above lower bounds.
Related Material


