Optimal Deterministic Coresets for Ridge Regression

Praneeth Kacham; David Woodruff

Optimal Deterministic Coresets for Ridge Regression

Praneeth Kacham, David Woodruff

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:4141-4150, 2020.

Abstract

We consider the ridge regression problem, for which we are given an nxd matrix A of examples and a corresponding nxd’ matrix B of labels, as well as a ridge parameter

$\lambda \geq 0$ , and would like to output an

$X’ \in R^{d \times d’}$ for which

$\|AX’-B\|_F^2 + \lambda \|X’\|_F^2 \leq (1+\epsilon)OPT,$ where

${OPT} = \min_{Y \in \mathbb{R}^{d \times d’}} \|AY-B\|_F^2 + \lambda \|Y\|_F^2.$ In the special case of

$\lambda = 0$ , this is ordinary multi-response linear regression. Our focus is on deterministically constructing coresets for this problem. Here the goal is to select and re-weight a small subset of rows of

$A$ and corresponding labels of

$B$ , denoted by

$SA$ and

$SB$ , so that if

$X’$ is the minimizer to

$\min_{X’} \|SAX’-SB\|_F^2 + \lambda \|X’\|_F^2$ , then

$\|AX’-B\|_F^2 + \lambda \|X’\|_F^2 \leq (1+\epsilon)OPT$ . We show how to efficiently(poly(n,d,1/\epsilon) time) and deterministically select

$O({sd}_{\lambda}/\epsilon)$ rows of

$A$ and

$B$ to achieve this property, and prove a matching lower bound, showing that it is necessary to select

$\Omega({sd}_{\lambda}/\epsilon)$ rows no matter what the weights are, for any

$1 < 1/\epsilon \leq sd_{\lambda}$ . Here

${sd}_{\lambda}$ is the statistical dimension of the input, and we assume

$d’ = O({sd}_{\lambda}) \leq d$ . In the case of ordinary regression, this gives a deterministic algorithm achieving

$O(d/\epsilon)$ rows and a matching lower bound for any

$1 \leq 1/\epsilon \leq d$ ; for

$1/\epsilon > d$ we show

$\Theta(d^2)$ rows are sufficient. Finally we show our new coresets are mergeable, giving a deterministic protocol for ridge regression with

$O({sd}_{\lambda}/\epsilon)$ words of communication per server, in the important case when the rows of

$A$ and

$B$ have a constant number of non-zero entries and there are a constant number of servers. Prior to our work the best deterministic protocols in this setting required

$\Omega(min({sd}_{\lambda}^2,{sd}_{\lambda}/\epsilon^2))$ communication.

Cite this Paper

BibTeX


@InProceedings{pmlr-v108-kacham20a,
  title = 	 {Optimal Deterministic Coresets for Ridge Regression},
  author =       {Kacham, Praneeth and Woodruff, David},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {4141--4150},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/kacham20a/kacham20a.pdf},
  url = 	 {https://proceedings.mlr.press/v108/kacham20a.html},
  abstract = 	 {We consider the ridge regression problem, for which we are given an nxd matrix A of examples and a corresponding nxd’ matrix B of labels, as well as a ridge parameter $\lambda \geq 0$, and would like to output an $X’ \in R^{d \times d’}$ for which $$\|AX’-B\|_F^2 + \lambda \|X’\|_F^2 \leq (1+\epsilon)OPT,$$ where ${OPT} = \min_{Y \in \mathbb{R}^{d \times d’}} \|AY-B\|_F^2 + \lambda \|Y\|_F^2.$ In the special case of $\lambda = 0$, this is ordinary multi-response linear regression. Our focus is on deterministically constructing coresets for this problem. Here the goal is to select and re-weight a small subset of rows of $A$ and corresponding labels of $B$, denoted by $SA$ and $SB$, so that if $X’$ is the minimizer to $\min_{X’} \|SAX’-SB\|_F^2 + \lambda \|X’\|_F^2$, then $\|AX’-B\|_F^2 + \lambda \|X’\|_F^2 \leq (1+\epsilon)OPT$.         We show how to efficiently(poly(n,d,1/\epsilon) time) and deterministically select $O({sd}_{\lambda}/\epsilon)$ rows of $A$ and $B$ to achieve this property, and prove a matching lower bound, showing that it is necessary to select $\Omega({sd}_{\lambda}/\epsilon)$ rows no matter what the weights are, for any $1 < 1/\epsilon \leq sd_{\lambda}$. Here ${sd}_{\lambda}$ is the statistical dimension of the input, and we assume $d’ = O({sd}_{\lambda}) \leq d$. In the case of ordinary regression, this gives a deterministic algorithm achieving $O(d/\epsilon)$ rows and a matching lower bound for any $1 \leq 1/\epsilon \leq d$; for $1/\epsilon > d$ we show $\Theta(d^2)$ rows are sufficient. Finally we show our new coresets are mergeable, giving a deterministic protocol for ridge regression with $O({sd}_{\lambda}/\epsilon)$ words of communication per server, in the important case when the rows of $A$ and $B$ have a constant number of non-zero entries and there are a constant number of servers. Prior to our work the best deterministic protocols in this setting required $\Omega(min({sd}_{\lambda}^2,{sd}_{\lambda}/\epsilon^2))$ communication. }
}

Endnote

%0 Conference Paper
%T Optimal Deterministic Coresets for Ridge Regression
%A Praneeth Kacham
%A David Woodruff
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-kacham20a
%I PMLR
%P 4141--4150
%U https://proceedings.mlr.press/v108/kacham20a.html
%V 108
%X We consider the ridge regression problem, for which we are given an nxd matrix A of examples and a corresponding nxd’ matrix B of labels, as well as a ridge parameter $\lambda \geq 0$, and would like to output an $X’ \in R^{d \times d’}$ for which $$\|AX’-B\|_F^2 + \lambda \|X’\|_F^2 \leq (1+\epsilon)OPT,$$ where ${OPT} = \min_{Y \in \mathbb{R}^{d \times d’}} \|AY-B\|_F^2 + \lambda \|Y\|_F^2.$ In the special case of $\lambda = 0$, this is ordinary multi-response linear regression. Our focus is on deterministically constructing coresets for this problem. Here the goal is to select and re-weight a small subset of rows of $A$ and corresponding labels of $B$, denoted by $SA$ and $SB$, so that if $X’$ is the minimizer to $\min_{X’} \|SAX’-SB\|_F^2 + \lambda \|X’\|_F^2$, then $\|AX’-B\|_F^2 + \lambda \|X’\|_F^2 \leq (1+\epsilon)OPT$.         We show how to efficiently(poly(n,d,1/\epsilon) time) and deterministically select $O({sd}_{\lambda}/\epsilon)$ rows of $A$ and $B$ to achieve this property, and prove a matching lower bound, showing that it is necessary to select $\Omega({sd}_{\lambda}/\epsilon)$ rows no matter what the weights are, for any $1 < 1/\epsilon \leq sd_{\lambda}$. Here ${sd}_{\lambda}$ is the statistical dimension of the input, and we assume $d’ = O({sd}_{\lambda}) \leq d$. In the case of ordinary regression, this gives a deterministic algorithm achieving $O(d/\epsilon)$ rows and a matching lower bound for any $1 \leq 1/\epsilon \leq d$; for $1/\epsilon > d$ we show $\Theta(d^2)$ rows are sufficient. Finally we show our new coresets are mergeable, giving a deterministic protocol for ridge regression with $O({sd}_{\lambda}/\epsilon)$ words of communication per server, in the important case when the rows of $A$ and $B$ have a constant number of non-zero entries and there are a constant number of servers. Prior to our work the best deterministic protocols in this setting required $\Omega(min({sd}_{\lambda}^2,{sd}_{\lambda}/\epsilon^2))$ communication.

APA


Kacham, P. & Woodruff, D.. (2020). Optimal Deterministic Coresets for Ridge Regression. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:4141-4150 Available from https://proceedings.mlr.press/v108/kacham20a.html.

Optimal Deterministic Coresets for Ridge Regression

Abstract

Cite this Paper

Related Material