Optimal Sketching Bounds for Sparse Linear Regression

Tung Mai; Alexander Munteanu; Cameron Musco; Anup Rao; Chris Schwiegelshohn; David Woodruff

Optimal Sketching Bounds for Sparse Linear Regression

Tung Mai, Alexander Munteanu, Cameron Musco, Anup Rao, Chris Schwiegelshohn, David Woodruff

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:11288-11316, 2023.

Abstract

We study oblivious sketching for

$k$ -sparse linear regression under various loss functions. In particular, we are interested in a distribution over sketching matrices

$S\in\mathbb{R}^{m\times n}$ that does not depend on the inputs

$A\in\mathbb{R}^{n\times d}$ and

$b\in\mathbb{R}^n$ , such that, given access to

$SA$ and

$Sb$ , we can recover a

$k$ -sparse

$\tilde x\in\mathbb{R}^d$ with

$\|A\tilde x-b\|_f\leq (1+\varepsilon) \min\nolimits_{k{\mathrm{-sparse}\,x\in\mathbb{R}^d}} \|Ax-b\|_f$ . Here

$\|\cdot\|_f: \mathbb R^n \rightarrow \mathbb R$ is some loss function – such as an

$\ell_p$ norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse

$\ell_2$ norm regression, there is a distribution over oblivious sketches with

$m=\Theta(k\log(d/k)/\varepsilon^2)$ rows, which is tight up to a constant factor. This extends to

$\ell_p$ loss with an additional additive

$O(k\log(k/\varepsilon)/\varepsilon^2)$ term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression, where

$A$ is the identity matrix. For this problem, under the

$\ell_2$ norm, we observe an upper bound of

$m=O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$ , showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve

$m = o(d)$ showing that

$m=O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2)$ rows suffice, where

$\mu$ is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on

$\mu$ . Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize

$\|Ax-b\|_2^2+\lambda\|x\|_1$ over

$x\in\mathbb{R}^d$ . We show that sketching dimension

$m =O(\log(d)/(\lambda \varepsilon)^2)$ suffices and that the dependence on

$d$ and

$\lambda$ is tight.

Cite this Paper

BibTeX


@InProceedings{pmlr-v206-mai23a,
  title = 	 {Optimal Sketching Bounds for Sparse Linear Regression},
  author =       {Mai, Tung and Munteanu, Alexander and Musco, Cameron and Rao, Anup and Schwiegelshohn, Chris and Woodruff, David},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {11288--11316},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/mai23a/mai23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/mai23a.html},
  abstract = 	 {We study oblivious sketching for $k$-sparse linear regression under various loss functions. In particular, we are interested in a distribution over sketching matrices $S\in\mathbb{R}^{m\times n}$ that does not depend on the inputs $A\in\mathbb{R}^{n\times d}$ and $b\in\mathbb{R}^n$, such that, given access to $SA$ and $Sb$, we can recover a $k$-sparse $\tilde x\in\mathbb{R}^d$ with $\|A\tilde x-b\|_f\leq (1+\varepsilon) \min\nolimits_{k{\mathrm{-sparse}\,x\in\mathbb{R}^d}} \|Ax-b\|_f$. Here $\|\cdot\|_f: \mathbb R^n \rightarrow \mathbb R$ is some loss function – such as an $\ell_p$ norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse $\ell_2$ norm regression, there is a distribution over oblivious sketches with $m=\Theta(k\log(d/k)/\varepsilon^2)$ rows, which is tight up to a constant factor. This extends to $\ell_p$ loss with an additional additive $O(k\log(k/\varepsilon)/\varepsilon^2)$ term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression, where $A$ is the identity matrix. For this problem, under the $\ell_2$ norm, we observe an upper bound of $m=O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve $m = o(d)$ showing that $m=O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2)$ rows suffice, where $\mu$ is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on $\mu$. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize $\|Ax-b\|_2^2+\lambda\|x\|_1$ over $x\in\mathbb{R}^d$. We show that sketching dimension $m =O(\log(d)/(\lambda \varepsilon)^2)$ suffices and that the dependence on $d$ and $\lambda$ is tight.}
}

Endnote

%0 Conference Paper
%T Optimal Sketching Bounds for Sparse Linear Regression
%A Tung Mai
%A Alexander Munteanu
%A Cameron Musco
%A Anup Rao
%A Chris Schwiegelshohn
%A David Woodruff
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-mai23a
%I PMLR
%P 11288--11316
%U https://proceedings.mlr.press/v206/mai23a.html
%V 206
%X We study oblivious sketching for $k$-sparse linear regression under various loss functions. In particular, we are interested in a distribution over sketching matrices $S\in\mathbb{R}^{m\times n}$ that does not depend on the inputs $A\in\mathbb{R}^{n\times d}$ and $b\in\mathbb{R}^n$, such that, given access to $SA$ and $Sb$, we can recover a $k$-sparse $\tilde x\in\mathbb{R}^d$ with $\|A\tilde x-b\|_f\leq (1+\varepsilon) \min\nolimits_{k{\mathrm{-sparse}\,x\in\mathbb{R}^d}} \|Ax-b\|_f$. Here $\|\cdot\|_f: \mathbb R^n \rightarrow \mathbb R$ is some loss function – such as an $\ell_p$ norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse $\ell_2$ norm regression, there is a distribution over oblivious sketches with $m=\Theta(k\log(d/k)/\varepsilon^2)$ rows, which is tight up to a constant factor. This extends to $\ell_p$ loss with an additional additive $O(k\log(k/\varepsilon)/\varepsilon^2)$ term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression, where $A$ is the identity matrix. For this problem, under the $\ell_2$ norm, we observe an upper bound of $m=O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve $m = o(d)$ showing that $m=O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2)$ rows suffice, where $\mu$ is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on $\mu$. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize $\|Ax-b\|_2^2+\lambda\|x\|_1$ over $x\in\mathbb{R}^d$. We show that sketching dimension $m =O(\log(d)/(\lambda \varepsilon)^2)$ suffices and that the dependence on $d$ and $\lambda$ is tight.

APA


Mai, T., Munteanu, A., Musco, C., Rao, A., Schwiegelshohn, C. & Woodruff, D.. (2023). Optimal Sketching Bounds for Sparse Linear Regression. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:11288-11316 Available from https://proceedings.mlr.press/v206/mai23a.html.

Related Material

Download PDF