[edit]
Exponentially Improved Dimensionality Reduction for l1: Subspace Embeddings and Independence Testing
Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134:3111-3195, 2021.
Abstract
Despite many applications, dimensionality reduction in the ℓ1-norm is much less understood than in the Euclidean norm. We give two new oblivious dimensionality reduction techniques for the ℓ1-norm which improve exponentially over prior ones:
- We design a distribution over random matrices S∈Rr×n, where r=2poly(d/(εδ)), such that given any matrix A∈Rn×d, with probability at least 1−δ, simultaneously for all x, ‖. Note that S is linear, does not depend on A, and maps \ell_1 into \ell_1. Our distribution provides an exponential improvement on the previous best known map of Wang and Woodruff (SODA, 2019), which required r = 2^{2^{\Omega(d)}}, even for constant \varepsilon and \delta. Our bound is optimal, up to a polynomial factor in the exponent, given a known 2^{\textrm{poly}(d)} lower bound for constant \varepsilon and \delta.
- We design a distribution over matrices S \in \mathbb{R}^{k \times n}, where k = 2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}, such that given any q-mode tensor A \in (\mathbb{R}^{d})^{\otimes q}, one can estimate the entrywise \ell_1-norm \|A\|_1 from S(A). Moreover, S = S^1 \otimes S^2 \otimes \cdots \otimes S^q and so given vectors u_1, \ldots, u_q \in \mathbb{R}^d, one can compute S(u_1 \otimes u_2 \otimes \cdots \otimes u_q) in time 2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}, which is much faster than the d^q time required to form u_1 \otimes u_2 \otimes \cdots \otimes u_q. Our linear map gives a streaming algorithm for independence testing using space 2^{O(q^2)}(\varepsilon^{-1} q \log d)^{O(q)}, improving the previous doubly exponential (\varepsilon^{-1} \log d)^{q^{O(q)}} space bound of Braverman and Ostrovsky (STOC, 2010).
For subspace embeddings, we also study the setting when A is itself drawn from distributions with independent entries, and obtain a polynomial embedding dimension. For independence testing, we also give algorithms for any distance measure with a polylogarithmic-sized sketch and satisfying an approximate triangle inequality.