DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

Christina Heinze, Brian McWilliams, Nicolai Meinshausen
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:875-883, 2016.

Abstract

We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed across workers according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependencies between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy. In particular, DUAL-LOCO allows for fast cross validation as only part of the algorithm depends on the regularization parameter.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-heinze16, title = {DUAL-LOCO: Distributing Statistical Estimation Using Random Projections}, author = {Heinze, Christina and McWilliams, Brian and Meinshausen, Nicolai}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {875--883}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/heinze16.pdf}, url = {http://proceedings.mlr.press/v51/heinze16.html}, abstract = {We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed across workers according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependencies between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy. In particular, DUAL-LOCO allows for fast cross validation as only part of the algorithm depends on the regularization parameter.} }
Endnote
%0 Conference Paper %T DUAL-LOCO: Distributing Statistical Estimation Using Random Projections %A Christina Heinze %A Brian McWilliams %A Nicolai Meinshausen %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-heinze16 %I PMLR %P 875--883 %U http://proceedings.mlr.press/v51/heinze16.html %V 51 %X We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed across workers according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependencies between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy. In particular, DUAL-LOCO allows for fast cross validation as only part of the algorithm depends on the regularization parameter.
RIS
TY - CPAPER TI - DUAL-LOCO: Distributing Statistical Estimation Using Random Projections AU - Christina Heinze AU - Brian McWilliams AU - Nicolai Meinshausen BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-heinze16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 875 EP - 883 L1 - http://proceedings.mlr.press/v51/heinze16.pdf UR - http://proceedings.mlr.press/v51/heinze16.html AB - We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed across workers according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependencies between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy. In particular, DUAL-LOCO allows for fast cross validation as only part of the algorithm depends on the regularization parameter. ER -
APA
Heinze, C., McWilliams, B. & Meinshausen, N.. (2016). DUAL-LOCO: Distributing Statistical Estimation Using Random Projections. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:875-883 Available from http://proceedings.mlr.press/v51/heinze16.html.

Related Material