Rehashing Kernel Evaluation in High Dimensions

Paris Siminelakis, Kexin Rong, Peter Bailis, Moses Charikar, Philip Levis
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:5789-5798, 2019.

Abstract

Kernel methods are effective but do not scale well to large scale data, especially in high dimensions where the geometric data structures used to accelerate kernel evaluation suffer from the curse of dimensionality. Recent theoretical advances have proposed fast kernel evaluation algorithms leveraging hashing techniques with worst-case asymptotic improvements. However, these advances are largely confined to the theoretical realm due to concerns such as super-linear preprocessing time and diminishing gains in non-worst case datasets. In this paper, we close the gap between theory and practice by addressing these challenges via provable and practical procedures for adaptive sample size selection, preprocessing time reduction, and refined variance bounds that quantify the data-dependent performance of random sampling and hashing-based kernel evaluation methods. Our experiments show that these new tools offer up to $10\times$ improvement in evaluation time on a range of synthetic and real-world datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-siminelakis19a, title = {Rehashing Kernel Evaluation in High Dimensions}, author = {Siminelakis, Paris and Rong, Kexin and Bailis, Peter and Charikar, Moses and Levis, Philip}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {5789--5798}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/siminelakis19a/siminelakis19a.pdf}, url = {https://proceedings.mlr.press/v97/siminelakis19a.html}, abstract = {Kernel methods are effective but do not scale well to large scale data, especially in high dimensions where the geometric data structures used to accelerate kernel evaluation suffer from the curse of dimensionality. Recent theoretical advances have proposed fast kernel evaluation algorithms leveraging hashing techniques with worst-case asymptotic improvements. However, these advances are largely confined to the theoretical realm due to concerns such as super-linear preprocessing time and diminishing gains in non-worst case datasets. In this paper, we close the gap between theory and practice by addressing these challenges via provable and practical procedures for adaptive sample size selection, preprocessing time reduction, and refined variance bounds that quantify the data-dependent performance of random sampling and hashing-based kernel evaluation methods. Our experiments show that these new tools offer up to $10\times$ improvement in evaluation time on a range of synthetic and real-world datasets.} }
Endnote
%0 Conference Paper %T Rehashing Kernel Evaluation in High Dimensions %A Paris Siminelakis %A Kexin Rong %A Peter Bailis %A Moses Charikar %A Philip Levis %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-siminelakis19a %I PMLR %P 5789--5798 %U https://proceedings.mlr.press/v97/siminelakis19a.html %V 97 %X Kernel methods are effective but do not scale well to large scale data, especially in high dimensions where the geometric data structures used to accelerate kernel evaluation suffer from the curse of dimensionality. Recent theoretical advances have proposed fast kernel evaluation algorithms leveraging hashing techniques with worst-case asymptotic improvements. However, these advances are largely confined to the theoretical realm due to concerns such as super-linear preprocessing time and diminishing gains in non-worst case datasets. In this paper, we close the gap between theory and practice by addressing these challenges via provable and practical procedures for adaptive sample size selection, preprocessing time reduction, and refined variance bounds that quantify the data-dependent performance of random sampling and hashing-based kernel evaluation methods. Our experiments show that these new tools offer up to $10\times$ improvement in evaluation time on a range of synthetic and real-world datasets.
APA
Siminelakis, P., Rong, K., Bailis, P., Charikar, M. & Levis, P.. (2019). Rehashing Kernel Evaluation in High Dimensions. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:5789-5798 Available from https://proceedings.mlr.press/v97/siminelakis19a.html.

Related Material