[edit]

# Optimal Sketching Bounds for Sparse Linear Regression

*Proceedings of The 26th International Conference on Artificial Intelligence and Statistics*, PMLR 206:11288-11316, 2023.

#### Abstract

We study oblivious sketching for $k$-sparse linear regression under various loss functions. In particular, we are interested in a distribution over sketching matrices $S\in\mathbb{R}^{m\times n}$ that does not depend on the inputs $A\in\mathbb{R}^{n\times d}$ and $b\in\mathbb{R}^n$, such that, given access to $SA$ and $Sb$, we can recover a $k$-sparse $\tilde x\in\mathbb{R}^d$ with $\|A\tilde x-b\|_f\leq (1+\varepsilon) \min\nolimits_{k{\mathrm{-sparse}\,x\in\mathbb{R}^d}} \|Ax-b\|_f$. Here $\|\cdot\|_f: \mathbb R^n \rightarrow \mathbb R$ is some loss function – such as an $\ell_p$ norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse $\ell_2$ norm regression, there is a distribution over oblivious sketches with $m=\Theta(k\log(d/k)/\varepsilon^2)$ rows, which is tight up to a constant factor. This extends to $\ell_p$ loss with an additional additive $O(k\log(k/\varepsilon)/\varepsilon^2)$ term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression, where $A$ is the identity matrix. For this problem, under the $\ell_2$ norm, we observe an upper bound of $m=O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve $m = o(d)$ showing that $m=O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2)$ rows suffice, where $\mu$ is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on $\mu$. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize $\|Ax-b\|_2^2+\lambda\|x\|_1$ over $x\in\mathbb{R}^d$. We show that sketching dimension $m =O(\log(d)/(\lambda \varepsilon)^2)$ suffices and that the dependence on $d$ and $\lambda$ is tight.