$\ell_p$-Regression in the Arbitrary Partition Model of Communication

Yi Li; Honghao Lin; David Woodruff

$\ell_p$ -Regression in the Arbitrary Partition Model of Communication

Yi Li, Honghao Lin, David Woodruff

Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:4902-4928, 2023.

Abstract

We consider the randomized communication complexity of the distributed

$\ell_p$ -regression problem in the coordinator model, for

$p\in (0,2]$ . In this problem, there is a coordinator and

$s$ servers. The

$i$ -th server receives

$A^i\in\{-M, -M+1, \ldots, M\}^{n\times d}$ and

$b^i\in\{-M, -M+1, \ldots, M\}^n$ and the coordinator would like to find a

$(1+\eps)$ -approximate solution to

$\min_{x\in\R^n} \norm{(\sum_i A^i)x - (\sum_i b^i)}_p$ . Here

$M \leq \poly(nd)$ for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For

$p = 2$ , i.e., least squares regression, we give the first optimal bound of

$\tilde{\Theta}(sd^2 + sd/\epsilon)$ bits. For

$p \in (1,2)$ , we obtain an

$\tilde{O}(sd^2/\eps + sd/\poly(\eps))$ upper bound. Notably, for

$d$ sufficiently large, our leading order term only depends linearly on

$1/\epsilon$ rather than quadratically. We also show communication lower bounds of

$\Omega(sd^2 + sd/\eps^2)$ for

$p\in (0,1]$ and

$\Omega(sd^2 + sd/\eps)$ for

$p\in (1,2]$ . Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).

Cite this Paper

BibTeX


@InProceedings{pmlr-v195-li23b,
  title = 	 {$\ell_p$-Regression in the Arbitrary Partition Model of Communication},
  author =       {Li, Yi and Lin, Honghao and Woodruff, David},
  booktitle = 	 {Proceedings of Thirty Sixth Conference on Learning Theory},
  pages = 	 {4902--4928},
  year = 	 {2023},
  editor = 	 {Neu, Gergely and Rosasco, Lorenzo},
  volume = 	 {195},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {12--15 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v195/li23b/li23b.pdf},
  url = 	 {https://proceedings.mlr.press/v195/li23b.html},
  abstract = 	 {We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0,2]$. In this problem, there is a coordinator and $s$ servers. The $i$-th server receives $A^i\in\{-M, -M+1, \ldots, M\}^{n\times d}$ and $b^i\in\{-M, -M+1, \ldots, M\}^n$ and the coordinator would like to find a $(1+\eps)$-approximate solution to $\min_{x\in\R^n} \norm{(\sum_i A^i)x - (\sum_i b^i)}_p$. Here $M \leq \poly(nd)$ for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model.     We obtain significantly improved bounds for this problem. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $\tilde{\Theta}(sd^2 + sd/\epsilon)$ bits.     For $p \in (1,2)$, we obtain an $\tilde{O}(sd^2/\eps + sd/\poly(\eps))$ upper bound. Notably, for $d$ sufficiently large, our leading order term only depends linearly on $1/\epsilon$ rather than quadratically. We also show communication lower bounds of $\Omega(sd^2 + sd/\eps^2)$ for $p\in (0,1]$ and $\Omega(sd^2 + sd/\eps)$ for $p\in (1,2]$. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020). }
}

Endnote

%0 Conference Paper
%T $\ell_p$-Regression in the Arbitrary Partition Model of Communication
%A Yi Li
%A Honghao Lin
%A David Woodruff
%B Proceedings of Thirty Sixth Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2023
%E Gergely Neu
%E Lorenzo Rosasco	
%F pmlr-v195-li23b
%I PMLR
%P 4902--4928
%U https://proceedings.mlr.press/v195/li23b.html
%V 195
%X We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0,2]$. In this problem, there is a coordinator and $s$ servers. The $i$-th server receives $A^i\in\{-M, -M+1, \ldots, M\}^{n\times d}$ and $b^i\in\{-M, -M+1, \ldots, M\}^n$ and the coordinator would like to find a $(1+\eps)$-approximate solution to $\min_{x\in\R^n} \norm{(\sum_i A^i)x - (\sum_i b^i)}_p$. Here $M \leq \poly(nd)$ for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model.     We obtain significantly improved bounds for this problem. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $\tilde{\Theta}(sd^2 + sd/\epsilon)$ bits.     For $p \in (1,2)$, we obtain an $\tilde{O}(sd^2/\eps + sd/\poly(\eps))$ upper bound. Notably, for $d$ sufficiently large, our leading order term only depends linearly on $1/\epsilon$ rather than quadratically. We also show communication lower bounds of $\Omega(sd^2 + sd/\eps^2)$ for $p\in (0,1]$ and $\Omega(sd^2 + sd/\eps)$ for $p\in (1,2]$. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).

APA


Li, Y., Lin, H. & Woodruff, D.. (2023). $\ell_p$-Regression in the Arbitrary Partition Model of Communication. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:4902-4928 Available from https://proceedings.mlr.press/v195/li23b.html.

Related Material

Download PDF

ℓp\ell_p-Regression in the Arbitrary Partition Model of Communication

Abstract

Cite this Paper

Related Material

$\ell_p$ -Regression in the Arbitrary Partition Model of Communication