[edit]
High-Dimensional Geometric Streaming for Nearly Low Rank Data
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:12588-12605, 2024.
Abstract
We study streaming algorithms for the ℓp subspace approximation problem. Given points a1,…,an as an insertion-only stream and a rank parameter k, the ℓp subspace approximation problem is to find a k-dimensional subspace V such that (∑ni=1d(ai,V)p)1/p is minimized, where d(a,V) denotes the Euclidean distance between a and V defined as min. When p = \infty, we need to find a subspace V that minimizes \max_i d(a_i, V). For \ell_{\infty} subspace approximation, we give a deterministic strong coreset construction algorithm and show that it can be used to compute a \mathrm{poly}(k, \log n) approximate solution. We show that the distortion obtained by our coreset is nearly tight for any sublinear space algorithm. For \ell_p subspace approximation, we show that suitably scaling the points and then using our \ell_{\infty} coreset construction, we can compute a \mathrm{poly}(k, \log n) approximation. Our algorithms are easy to implement and run very fast on large datasets. We also use our strong coreset construction to improve the results in a recent work of Woodruff and Yasuda (FOCS 2022) which gives streaming algorithms for high-dimensional geometric problems such as width estimation, convex hull estimation, and volume estimation.