One-shot Distributed Ridge Regression in High Dimensions

Yue Sheng, Edgar Dobriban
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:8763-8772, 2020.

Abstract

To scale up data analysis, distributed and parallel computing approaches are increasingly needed. Here we study a fundamental problem in this area: How to do ridge regression in a distributed computing environment? We study one-shot methods constructing weighted combinations of ridge regression estimators computed on each machine. By analyzing the mean squared error in a high dimensional model where each predictor has a small effect, we discover several new phenomena including that the efficiency depends strongly on the signal strength, but does not degrade with many workers, the risk decouples over machines, and the unexpected consequence that the optimal weights do not sum to unity. We also propose a new optimally weighted one-shot ridge regression algorithm. Our results are supported by simulations and real data analysis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-sheng20a, title = {One-shot Distributed Ridge Regression in High Dimensions}, author = {Sheng, Yue and Dobriban, Edgar}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {8763--8772}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/sheng20a/sheng20a.pdf}, url = {https://proceedings.mlr.press/v119/sheng20a.html}, abstract = {To scale up data analysis, distributed and parallel computing approaches are increasingly needed. Here we study a fundamental problem in this area: How to do ridge regression in a distributed computing environment? We study one-shot methods constructing weighted combinations of ridge regression estimators computed on each machine. By analyzing the mean squared error in a high dimensional model where each predictor has a small effect, we discover several new phenomena including that the efficiency depends strongly on the signal strength, but does not degrade with many workers, the risk decouples over machines, and the unexpected consequence that the optimal weights do not sum to unity. We also propose a new optimally weighted one-shot ridge regression algorithm. Our results are supported by simulations and real data analysis.} }
Endnote
%0 Conference Paper %T One-shot Distributed Ridge Regression in High Dimensions %A Yue Sheng %A Edgar Dobriban %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-sheng20a %I PMLR %P 8763--8772 %U https://proceedings.mlr.press/v119/sheng20a.html %V 119 %X To scale up data analysis, distributed and parallel computing approaches are increasingly needed. Here we study a fundamental problem in this area: How to do ridge regression in a distributed computing environment? We study one-shot methods constructing weighted combinations of ridge regression estimators computed on each machine. By analyzing the mean squared error in a high dimensional model where each predictor has a small effect, we discover several new phenomena including that the efficiency depends strongly on the signal strength, but does not degrade with many workers, the risk decouples over machines, and the unexpected consequence that the optimal weights do not sum to unity. We also propose a new optimally weighted one-shot ridge regression algorithm. Our results are supported by simulations and real data analysis.
APA
Sheng, Y. & Dobriban, E.. (2020). One-shot Distributed Ridge Regression in High Dimensions. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:8763-8772 Available from https://proceedings.mlr.press/v119/sheng20a.html.

Related Material