Competing with the Empirical Risk Minimizer in a Single Pass

Roy Frostig; Rong Ge; Sham M. Kakade; Aaron Sidford

Competing with the Empirical Risk Minimizer in a Single Pass

Roy Frostig, Rong Ge, Sham M. Kakade, Aaron Sidford

Proceedings of The 28th Conference on Learning Theory, PMLR 40:728-763, 2015.

Abstract

In many estimation problems, e.g. linear and logistic regression, we wish to minimize an unknown objective given only unbiased samples of the objective function. Furthermore, we aim to achieve this using as few samples as possible. In the absence of computational constraints, the minimizer of a sample average of observed data – commonly referred to as either the empirical risk minimizer (ERM) or the M-estimator – is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties. Our goal in this work is to perform as well as the ERM, on \emphevery problem, while minimizing the use of computational resources such as running time and space usage. We provide a simple streaming algorithm which, under standard regularity assumptions on the underlying problem, enjoys the following properties: \beginenumerate \item The algorithm can be implemented in linear time with a single pass of the observed data, using space linear in the size of a single sample. \item The algorithm achieves the same statistical rate of convergence as the empirical risk minimizer on every problem, even considering constant factors. \item The algorithm’s performance depends on the initial error at a rate that decreases super-polynomially. \item The algorithm is easily parallelizable. \endenumerate Moreover, we quantify the (finite-sample) rate at which the algorithm becomes competitive with the ERM.

Cite this Paper

BibTeX

@InProceedings{pmlr-v40-Frostig15,
  title = 	 {Competing with the Empirical Risk Minimizer in a Single Pass},
  author = 	 {Frostig, Roy and Ge, Rong and Kakade, Sham M. and Sidford, Aaron},
  booktitle = 	 {Proceedings of The 28th Conference on Learning Theory},
  pages = 	 {728--763},
  year = 	 {2015},
  editor = 	 {Grünwald, Peter and Hazan, Elad and Kale, Satyen},
  volume = 	 {40},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Paris, France},
  month = 	 {03--06 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v40/Frostig15.pdf},
  url = 	 {https://proceedings.mlr.press/v40/Frostig15.html},
  abstract = 	 {In many estimation problems, e.g. linear and logistic regression, we wish to minimize an unknown objective given only unbiased samples of the objective function. Furthermore, we aim to achieve this using as few samples as possible.  In the absence of computational constraints, the minimizer of a sample average of observed data – commonly referred to as either the empirical risk minimizer (ERM) or the M-estimator – is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties. Our goal in this work is to perform as well as the ERM, on \emphevery problem, while minimizing the use of computational resources such as running time and space usage. We provide a simple streaming algorithm which, under standard regularity assumptions on the underlying problem, enjoys the following properties: \beginenumerate \item The algorithm can be implemented in linear time with a single pass of the observed data, using space linear in the size of a single sample. \item The algorithm achieves the same statistical rate of convergence as the empirical risk minimizer on every problem, even considering constant factors. \item The algorithm’s performance depends on the initial error at a rate that decreases super-polynomially. \item The algorithm is easily parallelizable. \endenumerate Moreover, we quantify the (finite-sample) rate at which the algorithm becomes competitive with the ERM.}
}

Endnote

%0 Conference Paper
%T Competing with the Empirical Risk Minimizer in a Single Pass
%A Roy Frostig
%A Rong Ge
%A Sham M. Kakade
%A Aaron Sidford
%B Proceedings of The 28th Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2015
%E Peter Grünwald
%E Elad Hazan
%E Satyen Kale	
%F pmlr-v40-Frostig15
%I PMLR
%P 728--763
%U https://proceedings.mlr.press/v40/Frostig15.html
%V 40
%X In many estimation problems, e.g. linear and logistic regression, we wish to minimize an unknown objective given only unbiased samples of the objective function. Furthermore, we aim to achieve this using as few samples as possible.  In the absence of computational constraints, the minimizer of a sample average of observed data – commonly referred to as either the empirical risk minimizer (ERM) or the M-estimator – is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties. Our goal in this work is to perform as well as the ERM, on \emphevery problem, while minimizing the use of computational resources such as running time and space usage. We provide a simple streaming algorithm which, under standard regularity assumptions on the underlying problem, enjoys the following properties: \beginenumerate \item The algorithm can be implemented in linear time with a single pass of the observed data, using space linear in the size of a single sample. \item The algorithm achieves the same statistical rate of convergence as the empirical risk minimizer on every problem, even considering constant factors. \item The algorithm’s performance depends on the initial error at a rate that decreases super-polynomially. \item The algorithm is easily parallelizable. \endenumerate Moreover, we quantify the (finite-sample) rate at which the algorithm becomes competitive with the ERM.

RIS

TY  - CPAPER
TI  - Competing with the Empirical Risk Minimizer in a Single Pass
AU  - Roy Frostig
AU  - Rong Ge
AU  - Sham M. Kakade
AU  - Aaron Sidford
BT  - Proceedings of The 28th Conference on Learning Theory
DA  - 2015/06/26
ED  - Peter Grünwald
ED  - Elad Hazan
ED  - Satyen Kale	
ID  - pmlr-v40-Frostig15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 40
SP  - 728
EP  - 763
L1  - http://proceedings.mlr.press/v40/Frostig15.pdf
UR  - https://proceedings.mlr.press/v40/Frostig15.html
AB  - In many estimation problems, e.g. linear and logistic regression, we wish to minimize an unknown objective given only unbiased samples of the objective function. Furthermore, we aim to achieve this using as few samples as possible.  In the absence of computational constraints, the minimizer of a sample average of observed data – commonly referred to as either the empirical risk minimizer (ERM) or the M-estimator – is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties. Our goal in this work is to perform as well as the ERM, on \emphevery problem, while minimizing the use of computational resources such as running time and space usage. We provide a simple streaming algorithm which, under standard regularity assumptions on the underlying problem, enjoys the following properties: \beginenumerate \item The algorithm can be implemented in linear time with a single pass of the observed data, using space linear in the size of a single sample. \item The algorithm achieves the same statistical rate of convergence as the empirical risk minimizer on every problem, even considering constant factors. \item The algorithm’s performance depends on the initial error at a rate that decreases super-polynomially. \item The algorithm is easily parallelizable. \endenumerate Moreover, we quantify the (finite-sample) rate at which the algorithm becomes competitive with the ERM.
ER  -

APA

Frostig, R., Ge, R., Kakade, S.M. & Sidford, A.. (2015). Competing with the Empirical Risk Minimizer in a Single Pass. Proceedings of The 28th Conference on Learning Theory, in Proceedings of Machine Learning Research 40:728-763 Available from https://proceedings.mlr.press/v40/Frostig15.html.

Related Material

Download PDF