DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

Christina Heinze; Brian McWilliams; Nicolai Meinshausen

DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

Christina Heinze, Brian McWilliams, Nicolai Meinshausen

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:875-883, 2016.

Abstract

We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed across workers according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependencies between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy. In particular, DUAL-LOCO allows for fast cross validation as only part of the algorithm depends on the regularization parameter.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-heinze16,
  title = 	 {DUAL-LOCO: Distributing Statistical Estimation Using Random Projections},
  author = 	 {Heinze, Christina and McWilliams, Brian and Meinshausen, Nicolai},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {875--883},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/heinze16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/heinze16.html},
  abstract = 	 {We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed across workers according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependencies between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy. In particular, DUAL-LOCO allows for fast cross validation as only part of the algorithm depends on the regularization parameter.}
}

Endnote

%0 Conference Paper
%T DUAL-LOCO: Distributing Statistical Estimation Using Random Projections
%A Christina Heinze
%A Brian McWilliams
%A Nicolai Meinshausen
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-heinze16
%I PMLR
%P 875--883
%U https://proceedings.mlr.press/v51/heinze16.html
%V 51
%X We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed across workers according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependencies between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy. In particular, DUAL-LOCO allows for fast cross validation as only part of the algorithm depends on the regularization parameter.

RIS


TY  - CPAPER
TI  - DUAL-LOCO: Distributing Statistical Estimation Using Random Projections
AU  - Christina Heinze
AU  - Brian McWilliams
AU  - Nicolai Meinshausen
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-heinze16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 875
EP  - 883
L1  - http://proceedings.mlr.press/v51/heinze16.pdf
UR  - https://proceedings.mlr.press/v51/heinze16.html
AB  - We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed across workers according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependencies between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy. In particular, DUAL-LOCO allows for fast cross validation as only part of the algorithm depends on the regularization parameter.
ER  -

APA


Heinze, C., McWilliams, B. & Meinshausen, N.. (2016). DUAL-LOCO: Distributing Statistical Estimation Using Random Projections. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:875-883 Available from https://proceedings.mlr.press/v51/heinze16.html.

DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

Abstract

Cite this Paper

Related Material