Sparse sketches with small inversion bias

Michal Derezinski; Zhenyu Liao; Edgar Dobriban; Michael Mahoney

Sparse sketches with small inversion bias

Michal Derezinski, Zhenyu Liao, Edgar Dobriban, Michael Mahoney

Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:1467-1510, 2021.

Abstract

For a tall

$n\times d$ matrix

$A$ and a random

$m\times n$ sketching matrix

$S$ , the sketched estimate of the inverse covariance matrix

$(A^\top A)^{-1}$ is typically biased:

$E[(\tilde A^\top\tilde A)^{-1}]\ne(A^\top A)^{-1}$ , where

$\tilde A=SA$ . This phenomenon, which we call inversion bias, arises, e.g., in statistics and distributed optimization, when averaging multiple independently constructed estimates of quantities that depend on the inverse covariance. We develop a framework for analyzing inversion bias, based on our proposed concept of an

$(\epsilon,\delta)$ -unbiased estimator for random matrices. We show that when the sketching matrix

$S$ is dense and has i.i.d. sub-gaussian entries, then after simple rescaling, the estimator

$(\frac m{m-d}\tilde A^\top\tilde A)^{-1}$ is

$(\epsilon,\delta)$ -unbiased for

$(A^\top A)^{-1}$ with a sketch of size

$m=O(d+\sqrt d/\epsilon)$ . This implies that for

$m=O(d)$ , the inversion bias of this estimator is

$O(1/\sqrt d)$ , which is much smaller than the

$\Theta(1)$ approximation error obtained as a consequence of the subspace embedding guarantee for sub-gaussian sketches. We then propose a new sketching technique, called LEverage Score Sparsified (LESS) embeddings, which uses ideas from both data-oblivious sparse embeddings as well as data-aware leverage-based row sampling methods, to get

$\epsilon$ inversion bias for sketch size

$m=O(d\log d+\sqrt d/\epsilon)$ in time

$O(\text{nnz}(A)\log n+md^2)$ , where nnz is the number of non-zeros. The key techniques enabling our analysis include an extension of a classical inequality of Bai and Silverstein for random quadratic forms, which we call the Restricted Bai-Silverstein inequality; and anti-concentration of the Binomial distribution via the Paley-Zygmund inequality, which we use to prove a lower bound showing that leverage score sampling sketches generally do not achieve small inversion bias.

Cite this Paper

BibTeX


@InProceedings{pmlr-v134-derezinski21a,
  title = 	 {Sparse sketches with small inversion bias},
  author =       {Derezinski, Michal and Liao, Zhenyu and Dobriban, Edgar and Mahoney, Michael},
  booktitle = 	 {Proceedings of Thirty Fourth Conference on Learning Theory},
  pages = 	 {1467--1510},
  year = 	 {2021},
  editor = 	 {Belkin, Mikhail and Kpotufe, Samory},
  volume = 	 {134},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {15--19 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v134/derezinski21a/derezinski21a.pdf},
  url = 	 {https://proceedings.mlr.press/v134/derezinski21a.html},
  abstract = 	 {For a tall $n\times d$ matrix $A$ and a random $m\times n$ sketching matrix $S$, the sketched estimate of the inverse covariance matrix $(A^\top A)^{-1}$ is typically biased: $E[(\tilde A^\top\tilde A)^{-1}]\ne(A^\top A)^{-1}$, where $\tilde A=SA$. This phenomenon, which we call inversion bias, arises, e.g., in statistics and distributed optimization, when averaging multiple independently constructed estimates of quantities that depend on the inverse covariance. We develop a framework for analyzing inversion bias, based on our proposed concept of an $(\epsilon,\delta)$-unbiased estimator for random matrices. We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, then after simple rescaling, the estimator $(\frac m{m-d}\tilde A^\top\tilde A)^{-1}$ is $(\epsilon,\delta)$-unbiased for $(A^\top A)^{-1}$ with a sketch of size $m=O(d+\sqrt d/\epsilon)$. This implies that for $m=O(d)$, the inversion bias of this estimator is $O(1/\sqrt d)$, which is much smaller than the $\Theta(1)$ approximation error obtained as a consequence of the subspace embedding guarantee for sub-gaussian sketches. We then propose a new sketching technique, called LEverage Score Sparsified (LESS) embeddings, which uses ideas from both data-oblivious sparse embeddings as well as data-aware leverage-based row sampling methods, to get $\epsilon$ inversion bias for sketch size $m=O(d\log d+\sqrt d/\epsilon)$ in time $O(\text{nnz}(A)\log n+md^2)$, where nnz is the number of non-zeros. The key techniques enabling our analysis include an extension of a classical inequality of Bai and Silverstein for random quadratic forms, which we call the Restricted Bai-Silverstein inequality; and anti-concentration of the Binomial distribution via the Paley-Zygmund inequality, which we use to prove a lower bound showing that leverage score sampling sketches generally do not achieve small inversion bias.}
}

Endnote

%0 Conference Paper
%T Sparse sketches with small inversion bias
%A Michal Derezinski
%A Zhenyu Liao
%A Edgar Dobriban
%A Michael Mahoney
%B Proceedings of Thirty Fourth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Mikhail Belkin
%E Samory Kpotufe	
%F pmlr-v134-derezinski21a
%I PMLR
%P 1467--1510
%U https://proceedings.mlr.press/v134/derezinski21a.html
%V 134
%X For a tall $n\times d$ matrix $A$ and a random $m\times n$ sketching matrix $S$, the sketched estimate of the inverse covariance matrix $(A^\top A)^{-1}$ is typically biased: $E[(\tilde A^\top\tilde A)^{-1}]\ne(A^\top A)^{-1}$, where $\tilde A=SA$. This phenomenon, which we call inversion bias, arises, e.g., in statistics and distributed optimization, when averaging multiple independently constructed estimates of quantities that depend on the inverse covariance. We develop a framework for analyzing inversion bias, based on our proposed concept of an $(\epsilon,\delta)$-unbiased estimator for random matrices. We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, then after simple rescaling, the estimator $(\frac m{m-d}\tilde A^\top\tilde A)^{-1}$ is $(\epsilon,\delta)$-unbiased for $(A^\top A)^{-1}$ with a sketch of size $m=O(d+\sqrt d/\epsilon)$. This implies that for $m=O(d)$, the inversion bias of this estimator is $O(1/\sqrt d)$, which is much smaller than the $\Theta(1)$ approximation error obtained as a consequence of the subspace embedding guarantee for sub-gaussian sketches. We then propose a new sketching technique, called LEverage Score Sparsified (LESS) embeddings, which uses ideas from both data-oblivious sparse embeddings as well as data-aware leverage-based row sampling methods, to get $\epsilon$ inversion bias for sketch size $m=O(d\log d+\sqrt d/\epsilon)$ in time $O(\text{nnz}(A)\log n+md^2)$, where nnz is the number of non-zeros. The key techniques enabling our analysis include an extension of a classical inequality of Bai and Silverstein for random quadratic forms, which we call the Restricted Bai-Silverstein inequality; and anti-concentration of the Binomial distribution via the Paley-Zygmund inequality, which we use to prove a lower bound showing that leverage score sampling sketches generally do not achieve small inversion bias.

APA


Derezinski, M., Liao, Z., Dobriban, E. & Mahoney, M.. (2021). Sparse sketches with small inversion bias. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:1467-1510 Available from https://proceedings.mlr.press/v134/derezinski21a.html.

Related Material

Download PDF