On the Nyström Approximation for Preconditioning in Kernel Machines

Amirhesam Abedsoltan, Parthe Pandit, Luis Rademacher, Mikhail Belkin
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:3718-3726, 2024.

Abstract

Kernel methods are a popular class of nonlinear predictive models in machine learning. Scalable algorithms for learning kernel models need to be iterative in nature, but convergence can be slow due to poor conditioning. Spectral preconditioning is an important tool to speed-up the convergence of such iterative algorithms for training kernel models. However computing and storing a spectral preconditioner can be expensive which can lead to large computational and storage overheads, precluding the application of kernel methods to problems with large datasets. A Nystrom approximation of the spectral preconditioner is often cheaper to compute and store, and has demonstrated success in practical applications. In this paper we analyze the trade-offs of using such an approximated preconditioner. Specifically, we show that a sample of logarithmic size (as a function of the size of the dataset) enables the Nyström-based approximated preconditioner to accelerate gradient descent nearly as well as the exact preconditioner, while also reducing the computational and storage overheads.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-abedsoltan24a, title = { On the {N}yström Approximation for Preconditioning in Kernel Machines }, author = {Abedsoltan, Amirhesam and Pandit, Parthe and Rademacher, Luis and Belkin, Mikhail}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {3718--3726}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/abedsoltan24a/abedsoltan24a.pdf}, url = {https://proceedings.mlr.press/v238/abedsoltan24a.html}, abstract = { Kernel methods are a popular class of nonlinear predictive models in machine learning. Scalable algorithms for learning kernel models need to be iterative in nature, but convergence can be slow due to poor conditioning. Spectral preconditioning is an important tool to speed-up the convergence of such iterative algorithms for training kernel models. However computing and storing a spectral preconditioner can be expensive which can lead to large computational and storage overheads, precluding the application of kernel methods to problems with large datasets. A Nystrom approximation of the spectral preconditioner is often cheaper to compute and store, and has demonstrated success in practical applications. In this paper we analyze the trade-offs of using such an approximated preconditioner. Specifically, we show that a sample of logarithmic size (as a function of the size of the dataset) enables the Nyström-based approximated preconditioner to accelerate gradient descent nearly as well as the exact preconditioner, while also reducing the computational and storage overheads. } }
Endnote
%0 Conference Paper %T On the Nyström Approximation for Preconditioning in Kernel Machines %A Amirhesam Abedsoltan %A Parthe Pandit %A Luis Rademacher %A Mikhail Belkin %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-abedsoltan24a %I PMLR %P 3718--3726 %U https://proceedings.mlr.press/v238/abedsoltan24a.html %V 238 %X Kernel methods are a popular class of nonlinear predictive models in machine learning. Scalable algorithms for learning kernel models need to be iterative in nature, but convergence can be slow due to poor conditioning. Spectral preconditioning is an important tool to speed-up the convergence of such iterative algorithms for training kernel models. However computing and storing a spectral preconditioner can be expensive which can lead to large computational and storage overheads, precluding the application of kernel methods to problems with large datasets. A Nystrom approximation of the spectral preconditioner is often cheaper to compute and store, and has demonstrated success in practical applications. In this paper we analyze the trade-offs of using such an approximated preconditioner. Specifically, we show that a sample of logarithmic size (as a function of the size of the dataset) enables the Nyström-based approximated preconditioner to accelerate gradient descent nearly as well as the exact preconditioner, while also reducing the computational and storage overheads.
APA
Abedsoltan, A., Pandit, P., Rademacher, L. & Belkin, M.. (2024). On the Nyström Approximation for Preconditioning in Kernel Machines . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:3718-3726 Available from https://proceedings.mlr.press/v238/abedsoltan24a.html.

Related Material