p-Regression in the Arbitrary Partition Model of Communication

Yi Li, Honghao Lin, David Woodruff
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:4902-4928, 2023.

Abstract

We consider the randomized communication complexity of the distributed p-regression problem in the coordinator model, for p(0,2]. In this problem, there is a coordinator and s servers. The i-th server receives Ai{M,M+1,,M}n×d and bi{M,M+1,,M}n and the coordinator would like to find a (1+\eps)-approximate solution to min. Here M \leq \poly(nd) for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For p = 2, i.e., least squares regression, we give the first optimal bound of \tilde{\Theta}(sd^2 + sd/\epsilon) bits. For p \in (1,2), we obtain an \tilde{O}(sd^2/\eps + sd/\poly(\eps)) upper bound. Notably, for d sufficiently large, our leading order term only depends linearly on 1/\epsilon rather than quadratically. We also show communication lower bounds of \Omega(sd^2 + sd/\eps^2) for p\in (0,1] and \Omega(sd^2 + sd/\eps) for p\in (1,2]. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-li23b, title = {$\ell_p$-Regression in the Arbitrary Partition Model of Communication}, author = {Li, Yi and Lin, Honghao and Woodruff, David}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {4902--4928}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/li23b/li23b.pdf}, url = {https://proceedings.mlr.press/v195/li23b.html}, abstract = {We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0,2]$. In this problem, there is a coordinator and $s$ servers. The $i$-th server receives $A^i\in\{-M, -M+1, \ldots, M\}^{n\times d}$ and $b^i\in\{-M, -M+1, \ldots, M\}^n$ and the coordinator would like to find a $(1+\eps)$-approximate solution to $\min_{x\in\R^n} \norm{(\sum_i A^i)x - (\sum_i b^i)}_p$. Here $M \leq \poly(nd)$ for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $\tilde{\Theta}(sd^2 + sd/\epsilon)$ bits. For $p \in (1,2)$, we obtain an $\tilde{O}(sd^2/\eps + sd/\poly(\eps))$ upper bound. Notably, for $d$ sufficiently large, our leading order term only depends linearly on $1/\epsilon$ rather than quadratically. We also show communication lower bounds of $\Omega(sd^2 + sd/\eps^2)$ for $p\in (0,1]$ and $\Omega(sd^2 + sd/\eps)$ for $p\in (1,2]$. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020). } }
Endnote
%0 Conference Paper %T $\ell_p$-Regression in the Arbitrary Partition Model of Communication %A Yi Li %A Honghao Lin %A David Woodruff %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-li23b %I PMLR %P 4902--4928 %U https://proceedings.mlr.press/v195/li23b.html %V 195 %X We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0,2]$. In this problem, there is a coordinator and $s$ servers. The $i$-th server receives $A^i\in\{-M, -M+1, \ldots, M\}^{n\times d}$ and $b^i\in\{-M, -M+1, \ldots, M\}^n$ and the coordinator would like to find a $(1+\eps)$-approximate solution to $\min_{x\in\R^n} \norm{(\sum_i A^i)x - (\sum_i b^i)}_p$. Here $M \leq \poly(nd)$ for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $\tilde{\Theta}(sd^2 + sd/\epsilon)$ bits. For $p \in (1,2)$, we obtain an $\tilde{O}(sd^2/\eps + sd/\poly(\eps))$ upper bound. Notably, for $d$ sufficiently large, our leading order term only depends linearly on $1/\epsilon$ rather than quadratically. We also show communication lower bounds of $\Omega(sd^2 + sd/\eps^2)$ for $p\in (0,1]$ and $\Omega(sd^2 + sd/\eps)$ for $p\in (1,2]$. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).
APA
Li, Y., Lin, H. & Woodruff, D.. (2023). $\ell_p$-Regression in the Arbitrary Partition Model of Communication. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:4902-4928 Available from https://proceedings.mlr.press/v195/li23b.html.

Related Material