Sharp analysis of low-rank  kernel matrix approximations

Francis Bach

Sharp analysis of low-rank kernel matrix approximations

Francis Bach

Proceedings of the 26th Annual Conference on Learning Theory, PMLR 30:185-209, 2013.

Abstract

We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinite-dimensional feature spaces, a common practical limiting difficulty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n^2). Low-rank approximations of the kernel matrix are often considered as they allow the reduction of running time complexities to O(p^2 n), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this paper, we show that for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the \emphdegrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of non-parametric estimators. This result enables simple algorithms that have sub-quadratic running time complexity, but provably exhibit the same \emphpredictive performance than existing algorithms, for any given problem instance, and not only for worst-case situations.

Cite this Paper

BibTeX


@InProceedings{pmlr-v30-Bach13,
  title = 	 {Sharp analysis of low-rank  kernel matrix approximations},
  author = 	 {Bach, Francis},
  booktitle = 	 {Proceedings of the 26th Annual Conference on Learning Theory},
  pages = 	 {185--209},
  year = 	 {2013},
  editor = 	 {Shalev-Shwartz, Shai and Steinwart, Ingo},
  volume = 	 {30},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Princeton, NJ, USA},
  month = 	 {12--14 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v30/Bach13.pdf},
  url = 	 {https://proceedings.mlr.press/v30/Bach13.html},
  abstract = 	 {We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With  kernels leading to infinite-dimensional feature spaces, a common practical limiting difficulty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n^2). Low-rank approximations of the kernel matrix are often considered as they allow the reduction of  running time complexities  to O(p^2 n), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this paper, we show that for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the \emphdegrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of non-parametric estimators. This result enables simple algorithms that have sub-quadratic running time complexity, but provably exhibit the same \emphpredictive performance than existing algorithms, for any given problem instance, and not only for worst-case situations.}
}

Endnote

%0 Conference Paper
%T Sharp analysis of low-rank  kernel matrix approximations
%A Francis Bach
%B Proceedings of the 26th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2013
%E Shai Shalev-Shwartz
%E Ingo Steinwart	
%F pmlr-v30-Bach13
%I PMLR
%P 185--209
%U https://proceedings.mlr.press/v30/Bach13.html
%V 30
%X We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With  kernels leading to infinite-dimensional feature spaces, a common practical limiting difficulty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n^2). Low-rank approximations of the kernel matrix are often considered as they allow the reduction of  running time complexities  to O(p^2 n), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this paper, we show that for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the \emphdegrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of non-parametric estimators. This result enables simple algorithms that have sub-quadratic running time complexity, but provably exhibit the same \emphpredictive performance than existing algorithms, for any given problem instance, and not only for worst-case situations.

RIS


TY  - CPAPER
TI  - Sharp analysis of low-rank  kernel matrix approximations
AU  - Francis Bach
BT  - Proceedings of the 26th Annual Conference on Learning Theory
DA  - 2013/06/13
ED  - Shai Shalev-Shwartz
ED  - Ingo Steinwart	
ID  - pmlr-v30-Bach13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 30
SP  - 185
EP  - 209
L1  - http://proceedings.mlr.press/v30/Bach13.pdf
UR  - https://proceedings.mlr.press/v30/Bach13.html
AB  - We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With  kernels leading to infinite-dimensional feature spaces, a common practical limiting difficulty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n^2). Low-rank approximations of the kernel matrix are often considered as they allow the reduction of  running time complexities  to O(p^2 n), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this paper, we show that for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the \emphdegrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of non-parametric estimators. This result enables simple algorithms that have sub-quadratic running time complexity, but provably exhibit the same \emphpredictive performance than existing algorithms, for any given problem instance, and not only for worst-case situations.
ER  -

APA


Bach, F.. (2013). Sharp analysis of low-rank  kernel matrix approximations. Proceedings of the 26th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 30:185-209 Available from https://proceedings.mlr.press/v30/Bach13.html.

Related Material

Download PDF