Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

David Woodruff; Amir Zandieh

Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

David Woodruff, Amir Zandieh

Proceedings of the 39th International Conference on Machine Learning, PMLR 162:23933-23964, 2022.

Abstract

We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the q-fold column-wise tensor product of q matrices using a nearly optimal number of samples, improving upon all previously known methods by poly(q) factors. Furthermore, for the important special case of the q-fold self-tensoring of a dataset, which is the feature matrix of the degree-q polynomial kernel, the leading term of our method’s runtime is proportional to the size of the dataset and has no dependence on q. Previous techniques either incur a poly(q) factor slowdown in their runtime or remove the dependence on q at the expense of having sub-optimal target dimension, and depend quadratically on the number of data-points in their runtime. Our sampling technique relies on a collection of q partially correlated random projections which can be simultaneously applied to a dataset X in total time that only depends on the size of X, and at the same time their q-fold Kronecker product acts as a near-isometry for any fixed vector in the column span of

$X^{\otimes q}$ . We also show that our sampling methods generalize to other classes of kernels beyond polynomial, such as Gaussian and Neural Tangent kernels.

Cite this Paper

BibTeX


@InProceedings{pmlr-v162-woodruff22a,
  title = 	 {Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time},
  author =       {Woodruff, David and Zandieh, Amir},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {23933--23964},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/woodruff22a/woodruff22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/woodruff22a.html},
  abstract = 	 {We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the q-fold column-wise tensor product of q matrices using a nearly optimal number of samples, improving upon all previously known methods by poly(q) factors. Furthermore, for the important special case of the q-fold self-tensoring of a dataset, which is the feature matrix of the degree-q polynomial kernel, the leading term of our method’s runtime is proportional to the size of the dataset and has no dependence on q. Previous techniques either incur a poly(q) factor slowdown in their runtime or remove the dependence on q at the expense of having sub-optimal target dimension, and depend quadratically on the number of data-points in their runtime. Our sampling technique relies on a collection of q partially correlated random projections which can be simultaneously applied to a dataset X in total time that only depends on the size of X, and at the same time their q-fold Kronecker product acts as a near-isometry for any fixed vector in the column span of $X^{\otimes q}$. We also show that our sampling methods generalize to other classes of kernels beyond polynomial, such as Gaussian and Neural Tangent kernels.}
}

Endnote

%0 Conference Paper
%T Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time
%A David Woodruff
%A Amir Zandieh
%B Proceedings of the 39th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2022
%E Kamalika Chaudhuri
%E Stefanie Jegelka
%E Le Song
%E Csaba Szepesvari
%E Gang Niu
%E Sivan Sabato	
%F pmlr-v162-woodruff22a
%I PMLR
%P 23933--23964
%U https://proceedings.mlr.press/v162/woodruff22a.html
%V 162
%X We propose an input sparsity time sampling algorithm that can spectrally approximate the Gram matrix corresponding to the q-fold column-wise tensor product of q matrices using a nearly optimal number of samples, improving upon all previously known methods by poly(q) factors. Furthermore, for the important special case of the q-fold self-tensoring of a dataset, which is the feature matrix of the degree-q polynomial kernel, the leading term of our method’s runtime is proportional to the size of the dataset and has no dependence on q. Previous techniques either incur a poly(q) factor slowdown in their runtime or remove the dependence on q at the expense of having sub-optimal target dimension, and depend quadratically on the number of data-points in their runtime. Our sampling technique relies on a collection of q partially correlated random projections which can be simultaneously applied to a dataset X in total time that only depends on the size of X, and at the same time their q-fold Kronecker product acts as a near-isometry for any fixed vector in the column span of $X^{\otimes q}$. We also show that our sampling methods generalize to other classes of kernels beyond polynomial, such as Gaussian and Neural Tangent kernels.

APA


Woodruff, D. & Zandieh, A.. (2022). Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time. Proceedings of the 39th International Conference on Machine Learning, in Proceedings of Machine Learning Research 162:23933-23964 Available from https://proceedings.mlr.press/v162/woodruff22a.html.

Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time

Abstract

Cite this Paper

Related Material