Heavy-tailed regression with a generalized median-of-means

Daniel Hsu; Sivan Sabato

Heavy-tailed regression with a generalized median-of-means

Daniel Hsu, Sivan Sabato

Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):37-45, 2014.

Abstract

This work proposes a simple and computationally efficient estimator for linear regression, and other smooth and strongly convex loss minimization problems. We prove loss approximation guarantees that hold for general distributions, including those with heavy tails. All prior results only hold for estimators which either assume bounded or subgaussian distributions, require prior knowledge of distributional properties, or are not known to be computationally tractable. In the special case of linear regression with possibly heavy-tailed responses and with bounded and well-conditioned covariates in d-dimensions, we show that a random sample of size \tildeO(d\log(1/δ)) suffices to obtain a constant factor approximation to the optimal loss with probability 1-δ, a minimax optimal sample complexity up to log factors. The core technique used in the proposed estimator is a new generalization of the median-of-means estimator to arbitrary metric spaces.

Cite this Paper

BibTeX


@InProceedings{pmlr-v32-hsu14,
  title = 	 {Heavy-tailed regression with a generalized median-of-means},
  author = 	 {Hsu, Daniel and Sabato, Sivan},
  booktitle = 	 {Proceedings of the 31st International Conference on Machine Learning},
  pages = 	 {37--45},
  year = 	 {2014},
  editor = 	 {Xing, Eric P. and Jebara, Tony},
  volume = 	 {32},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Bejing, China},
  month = 	 {22--24 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v32/hsu14.pdf},
  url = 	 {https://proceedings.mlr.press/v32/hsu14.html},
  abstract = 	 {This work proposes a simple and computationally efficient estimator for  linear regression, and other smooth and strongly convex loss minimization  problems.  We prove loss approximation guarantees that hold for general distributions,  including those with heavy tails. All prior results only hold for estimators which  either assume bounded or subgaussian distributions,  require prior knowledge of distributional properties, or are not known to be computationally tractable.  In the special case of linear regression with possibly heavy-tailed responses and with bounded and well-conditioned covariates in d-dimensions, we show that a random sample of size  \tildeO(d\log(1/δ)) suffices to obtain a constant factor  approximation to the optimal loss with probability 1-δ, a minimax optimal sample complexity up to log factors.  The core technique used in the proposed estimator is a new generalization of  the median-of-means estimator to arbitrary metric spaces.}
}

Endnote

%0 Conference Paper
%T Heavy-tailed regression with a generalized median-of-means
%A Daniel Hsu
%A Sivan Sabato
%B Proceedings of the 31st International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2014
%E Eric P. Xing
%E Tony Jebara	
%F pmlr-v32-hsu14
%I PMLR
%P 37--45
%U https://proceedings.mlr.press/v32/hsu14.html
%V 32
%N 2
%X This work proposes a simple and computationally efficient estimator for  linear regression, and other smooth and strongly convex loss minimization  problems.  We prove loss approximation guarantees that hold for general distributions,  including those with heavy tails. All prior results only hold for estimators which  either assume bounded or subgaussian distributions,  require prior knowledge of distributional properties, or are not known to be computationally tractable.  In the special case of linear regression with possibly heavy-tailed responses and with bounded and well-conditioned covariates in d-dimensions, we show that a random sample of size  \tildeO(d\log(1/δ)) suffices to obtain a constant factor  approximation to the optimal loss with probability 1-δ, a minimax optimal sample complexity up to log factors.  The core technique used in the proposed estimator is a new generalization of  the median-of-means estimator to arbitrary metric spaces.

RIS


TY  - CPAPER
TI  - Heavy-tailed regression with a generalized median-of-means
AU  - Daniel Hsu
AU  - Sivan Sabato
BT  - Proceedings of the 31st International Conference on Machine Learning
DA  - 2014/06/18
ED  - Eric P. Xing
ED  - Tony Jebara	
ID  - pmlr-v32-hsu14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 32
IS  - 2
SP  - 37
EP  - 45
L1  - http://proceedings.mlr.press/v32/hsu14.pdf
UR  - https://proceedings.mlr.press/v32/hsu14.html
AB  - This work proposes a simple and computationally efficient estimator for  linear regression, and other smooth and strongly convex loss minimization  problems.  We prove loss approximation guarantees that hold for general distributions,  including those with heavy tails. All prior results only hold for estimators which  either assume bounded or subgaussian distributions,  require prior knowledge of distributional properties, or are not known to be computationally tractable.  In the special case of linear regression with possibly heavy-tailed responses and with bounded and well-conditioned covariates in d-dimensions, we show that a random sample of size  \tildeO(d\log(1/δ)) suffices to obtain a constant factor  approximation to the optimal loss with probability 1-δ, a minimax optimal sample complexity up to log factors.  The core technique used in the proposed estimator is a new generalization of  the median-of-means estimator to arbitrary metric spaces.
ER  -

APA


Hsu, D. & Sabato, S.. (2014). Heavy-tailed regression with a generalized median-of-means. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):37-45 Available from https://proceedings.mlr.press/v32/hsu14.html.

Related Material

Download PDF