Exponentially Improved Dimensionality Reduction for l1: Subspace Embeddings and Independence Testing

Yi Li, David Woodruff, Taisuke Yasuda

Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:3111-3195, 2021.

Abstract

Despite many applications, dimensionality reduction in the $\ell_1$-norm is much less understood than in the Euclidean norm. We give two new oblivious dimensionality reduction techniques for the $\ell_1$-norm which improve exponentially over prior ones:
- We design a distribution over random matrices $S \in \mathbb{R}^{r \times n}$, where $r = 2^{\textrm{poly}(d/(\varepsilon \delta))}$, such that given any matrix $A \in \mathbb{R}^{n \times d}$, with probability at least $1-\delta$, simultaneously for all $x$, $\|SAx\|_1 = (1 \pm \varepsilon)\|Ax\|_1$. Note that $S$ is linear, does not depend on $A$, and maps $\ell_1$ into $\ell_1$. Our distribution provides an exponential improvement on the previous best known map of Wang and Woodruff (SODA, 2019), which required $r = 2^{2^{\Omega(d)}}$, even for constant $\varepsilon$ and $\delta$. Our bound is optimal, up to a polynomial factor in the exponent, given a known $2^{\textrm{poly}(d)}$ lower bound for constant $\varepsilon$ and $\delta$.
- We design a distribution over matrices $S \in \mathbb{R}^{k \times n}$, where $k = 2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, such that given any $q$-mode tensor $A \in (\mathbb{R}^{d})^{\otimes q}$, one can estimate the entrywise $\ell_1$-norm $\|A\|_1$ from $S(A)$. Moreover, $S = S^1 \otimes S^2 \otimes \cdots \otimes S^q$ and so given vectors $u_1, \ldots, u_q \in \mathbb{R}^d$, one can compute $S(u_1 \otimes u_2 \otimes \cdots \otimes u_q)$ in time $2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, which is much faster than the $d^q$ time required to form $u_1 \otimes u_2 \otimes \cdots \otimes u_q$. Our linear map gives a streaming algorithm for independence testing using space $2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, improving the previous doubly exponential $(\varepsilon^{-1} \log d)^{q^{O(q)}}$ space bound of Braverman and Ostrovsky (STOC, 2010).
For subspace embeddings, we also study the setting when $A$ is itself drawn from distributions with independent entries, and obtain a polynomial embedding dimension. For independence testing, we also give algorithms for any distance measure with a polylogarithmic-sized sketch and satisfying an approximate triangle inequality.

Cite this Paper

BibTeX

@InProceedings{pmlr-v134-li21c,
title = {Exponentially Improved Dimensionality Reduction for l1: Subspace Embeddings and Independence Testing},
author = {Li, Yi and Woodruff, David and Yasuda, Taisuke},
booktitle = {Proceedings of Thirty Fourth Conference on Learning Theory},
pages = {3111--3195},
year = {2021},
editor = {Belkin, Mikhail and Kpotufe, Samory},
volume = {134},
series = {Proceedings of Machine Learning Research},
month = {15--19 Aug},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v134/li21c/li21c.pdf},
url = {https://proceedings.mlr.press/v134/li21c.html},
abstract = {Despite many applications, dimensionality reduction in the $\ell_1$-norm is much less understood than in the Euclidean norm. We give two new oblivious dimensionality reduction techniques for the $\ell_1$-norm which improve exponentially over prior ones:
- We design a distribution over random matrices $S \in \mathbb{R}^{r \times n}$, where $r = 2^{\textrm{poly}(d/(\varepsilon \delta))}$, such that given any matrix $A \in \mathbb{R}^{n \times d}$, with probability at least $1-\delta$, simultaneously for all $x$, $\|SAx\|_1 = (1 \pm \varepsilon)\|Ax\|_1$. Note that $S$ is linear, does not depend on $A$, and maps $\ell_1$ into $\ell_1$. Our distribution provides an exponential improvement on the previous best known map of Wang and Woodruff (SODA, 2019), which required $r = 2^{2^{\Omega(d)}}$, even for constant $\varepsilon$ and $\delta$. Our bound is optimal, up to a polynomial factor in the exponent, given a known $2^{\textrm{poly}(d)}$ lower bound for constant $\varepsilon$ and $\delta$.
- We design a distribution over matrices $S \in \mathbb{R}^{k \times n}$, where $k = 2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, such that given any $q$-mode tensor $A \in (\mathbb{R}^{d})^{\otimes q}$, one can estimate the entrywise $\ell_1$-norm $\|A\|_1$ from $S(A)$. Moreover, $S = S^1 \otimes S^2 \otimes \cdots \otimes S^q$ and so given vectors $u_1, \ldots, u_q \in \mathbb{R}^d$, one can compute $S(u_1 \otimes u_2 \otimes \cdots \otimes u_q)$ in time $2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, which is much faster than the $d^q$ time required to form $u_1 \otimes u_2 \otimes \cdots \otimes u_q$. Our linear map gives a streaming algorithm for independence testing using space $2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, improving the previous doubly exponential $(\varepsilon^{-1} \log d)^{q^{O(q)}}$ space bound of Braverman and Ostrovsky (STOC, 2010).
For subspace embeddings, we also study the setting when $A$ is itself drawn from distributions with independent entries, and obtain a polynomial embedding dimension. For independence testing, we also give algorithms for any distance measure with a polylogarithmic-sized sketch and satisfying an approximate triangle inequality.}
}

Endnote

%0 Conference Paper
%T Exponentially Improved Dimensionality Reduction for l1: Subspace Embeddings and Independence Testing
%A Yi Li
%A David Woodruff
%A Taisuke Yasuda
%B Proceedings of Thirty Fourth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2021
%E Mikhail Belkin
%E Samory Kpotufe
%F pmlr-v134-li21c
%I PMLR
%P 3111--3195
%U https://proceedings.mlr.press/v134/li21c.html
%V 134
%X Despite many applications, dimensionality reduction in the $\ell_1$-norm is much less understood than in the Euclidean norm. We give two new oblivious dimensionality reduction techniques for the $\ell_1$-norm which improve exponentially over prior ones:
- We design a distribution over random matrices $S \in \mathbb{R}^{r \times n}$, where $r = 2^{\textrm{poly}(d/(\varepsilon \delta))}$, such that given any matrix $A \in \mathbb{R}^{n \times d}$, with probability at least $1-\delta$, simultaneously for all $x$, $\|SAx\|_1 = (1 \pm \varepsilon)\|Ax\|_1$. Note that $S$ is linear, does not depend on $A$, and maps $\ell_1$ into $\ell_1$. Our distribution provides an exponential improvement on the previous best known map of Wang and Woodruff (SODA, 2019), which required $r = 2^{2^{\Omega(d)}}$, even for constant $\varepsilon$ and $\delta$. Our bound is optimal, up to a polynomial factor in the exponent, given a known $2^{\textrm{poly}(d)}$ lower bound for constant $\varepsilon$ and $\delta$.
- We design a distribution over matrices $S \in \mathbb{R}^{k \times n}$, where $k = 2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, such that given any $q$-mode tensor $A \in (\mathbb{R}^{d})^{\otimes q}$, one can estimate the entrywise $\ell_1$-norm $\|A\|_1$ from $S(A)$. Moreover, $S = S^1 \otimes S^2 \otimes \cdots \otimes S^q$ and so given vectors $u_1, \ldots, u_q \in \mathbb{R}^d$, one can compute $S(u_1 \otimes u_2 \otimes \cdots \otimes u_q)$ in time $2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, which is much faster than the $d^q$ time required to form $u_1 \otimes u_2 \otimes \cdots \otimes u_q$. Our linear map gives a streaming algorithm for independence testing using space $2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}$, improving the previous doubly exponential $(\varepsilon^{-1} \log d)^{q^{O(q)}}$ space bound of Braverman and Ostrovsky (STOC, 2010).
For subspace embeddings, we also study the setting when $A$ is itself drawn from distributions with independent entries, and obtain a polynomial embedding dimension. For independence testing, we also give algorithms for any distance measure with a polylogarithmic-sized sketch and satisfying an approximate triangle inequality.

APA

Li, Y., Woodruff, D. & Yasuda, T.. (2021). Exponentially Improved Dimensionality Reduction for l1: Subspace Embeddings and Independence Testing. Proceedings of Thirty Fourth Conference on Learning Theory, in Proceedings of Machine Learning Research 134:3111-3195 Available from https://proceedings.mlr.press/v134/li21c.html.