Sharper Bounds for $\ell_p$ Sensitivity Sampling

David Woodruff; Taisuke Yasuda

In large scale machine learning, random sampling is a popular way to approximate datasets by a small representative subset of examples. In particular, sensitivity sampling is an intensely studied technique which provides provable guarantees on the quality of approximation, while reducing the number of examples to the product of the VC dimension

$d$ and the total sensitivity

$\mathfrak{S}$ in remarkably general settings. However, guarantees going beyond this general bound of

$\mathfrak{S} d$ are known in perhaps only one setting, for $\ell_2$ subspace embeddings, despite intense study of sensitivity sampling in prior work. In this work, we show the first bounds for sensitivity sampling for

$\ell_p$ subspace embeddings for

$p\neq 2$ that improve over the general

$\mathfrak{S} d$ bound, achieving a bound of roughly

$\mathfrak{S}^{2/p}$ for

$1\leq p<2$ and

$\mathfrak{S}^{2-2/p}$ for $2root leverage score sampling algorithm achieves a bound of roughly

$d$ for

$1\leq p<2$ , and that a combination of leverage score and sensitivity sampling achieves an improved bound of roughly

$d^{2/p}\mathfrak{S}^{2-4/p}$ for $2

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-woodruff23a,
  title = 	 {Sharper Bounds for $\ell_p$ Sensitivity Sampling},
  author =       {Woodruff, David and Yasuda, Taisuke},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {37238--37272},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/woodruff23a/woodruff23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/woodruff23a.html},
  abstract = 	 {In large scale machine learning, random sampling is a popular way to approximate datasets by a small representative subset of examples. In particular, sensitivity sampling is an intensely studied technique which provides provable guarantees on the quality of approximation, while reducing the number of examples to the product of the VC dimension $d$ and the total sensitivity $\mathfrak{S}$ in remarkably general settings. However, guarantees going beyond this general bound of $\mathfrak{S} d$ are known in perhaps only one setting, for $\ell_2$ subspace embeddings, despite intense study of sensitivity sampling in prior work. In this work, we show the first bounds for sensitivity sampling for $\ell_p$ subspace embeddings for $p\neq 2$ that improve over the general $\mathfrak{S} d$ bound, achieving a bound of roughly $\mathfrak{S}^{2/p}$ for $1\leq p<2$ and $\mathfrak{S}^{2-2/p}$ for $2root leverage score sampling algorithm achieves a bound of roughly $d$ for $1\leq p<2$, and that a combination of leverage score and sensitivity sampling achieves an improved bound of roughly $d^{2/p}\mathfrak{S}^{2-4/p}$ for $2





  
    Endnote
  
  
  
    %0 Conference Paper
%T Sharper Bounds for $\ell_p$ Sensitivity Sampling
%A David Woodruff
%A Taisuke Yasuda
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-woodruff23a
%I PMLR
%P 37238--37272
%U https://proceedings.mlr.press/v202/woodruff23a.html
%V 202
%X In large scale machine learning, random sampling is a popular way to approximate datasets by a small representative subset of examples. In particular, sensitivity sampling is an intensely studied technique which provides provable guarantees on the quality of approximation, while reducing the number of examples to the product of the VC dimension $d$ and the total sensitivity $\mathfrak{S}$ in remarkably general settings. However, guarantees going beyond this general bound of $\mathfrak{S} d$ are known in perhaps only one setting, for $\ell_2$ subspace embeddings, despite intense study of sensitivity sampling in prior work. In this work, we show the first bounds for sensitivity sampling for $\ell_p$ subspace embeddings for $p\neq 2$ that improve over the general $\mathfrak{S} d$ bound, achieving a bound of roughly $\mathfrak{S}^{2/p}$ for $1\leq p<2$ and $\mathfrak{S}^{2-2/p}$ for $2root leverage score sampling algorithm achieves a bound of roughly $d$ for $1\leq p<2$, and that a combination of leverage score and sensitivity sampling achieves an improved bound of roughly $d^{2/p}\mathfrak{S}^{2-4/p}$ for $2
  

  
  
  






  
    APA
  
  
  
    
Woodruff, D. & Yasuda, T.. (2023). Sharper Bounds for $\ell_p$ Sensitivity Sampling. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:37238-37272 Available from https://proceedings.mlr.press/v202/woodruff23a.html.


  
  
  
  





  
  Related Material
  
    
      Download PDF
      OpenReview

Sharper Bounds for ℓp\ell_p Sensitivity Sampling

Abstract

Cite this Paper

Related Material

Sharper Bounds for $\ell_p$ Sensitivity Sampling