Robust Regression on MapReduce

Xiangrui Meng, Michael Mahoney
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):888-896, 2013.

Abstract

Although the MapReduce framework is now the \emphde facto standard for analyzing massive data sets, many algorithms (in particular, many iterative algorithms popular in machine learning, optimization, and linear algebra) are hard to fit into MapReduce. Consider, \emphe.g., the \ell_p regression problem: given a matrix A ∈\mathbbR^m \times n and a vector b ∈\mathbbR^m, find a vector x^* ∈\mathbbR^n that minimizes f(x) = \|A x - b\|_p. The widely-used \ell_2 regression, \emphi.e., linear least-squares, is known to be highly sensitive to outliers; and choosing p ∈[1, 2) can help improve robustness. In this work, we propose an efficient algorithm for solving strongly over-determined (m ≫n) robust \ell_p regression problems to moderate precision on MapReduce. Our empirical results on data up to the terabyte scale demonstrate that our algorithm is a significant improvement over traditional iterative algorithms on MapReduce for \ell_1 regression, even for a fairly small number of iterations. In addition, our proposed interior-point cutting-plane method can also be extended to solving more general convex problems on MapReduce.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-meng13b, title = {Robust Regression on MapReduce}, author = {Meng, Xiangrui and Mahoney, Michael}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {888--896}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/meng13b.pdf}, url = {https://proceedings.mlr.press/v28/meng13b.html}, abstract = {Although the MapReduce framework is now the \emphde facto standard for analyzing massive data sets, many algorithms (in particular, many iterative algorithms popular in machine learning, optimization, and linear algebra) are hard to fit into MapReduce. Consider, \emphe.g., the \ell_p regression problem: given a matrix A ∈\mathbbR^m \times n and a vector b ∈\mathbbR^m, find a vector x^* ∈\mathbbR^n that minimizes f(x) = \|A x - b\|_p. The widely-used \ell_2 regression, \emphi.e., linear least-squares, is known to be highly sensitive to outliers; and choosing p ∈[1, 2) can help improve robustness. In this work, we propose an efficient algorithm for solving strongly over-determined (m ≫n) robust \ell_p regression problems to moderate precision on MapReduce. Our empirical results on data up to the terabyte scale demonstrate that our algorithm is a significant improvement over traditional iterative algorithms on MapReduce for \ell_1 regression, even for a fairly small number of iterations. In addition, our proposed interior-point cutting-plane method can also be extended to solving more general convex problems on MapReduce.} }
Endnote
%0 Conference Paper %T Robust Regression on MapReduce %A Xiangrui Meng %A Michael Mahoney %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-meng13b %I PMLR %P 888--896 %U https://proceedings.mlr.press/v28/meng13b.html %V 28 %N 3 %X Although the MapReduce framework is now the \emphde facto standard for analyzing massive data sets, many algorithms (in particular, many iterative algorithms popular in machine learning, optimization, and linear algebra) are hard to fit into MapReduce. Consider, \emphe.g., the \ell_p regression problem: given a matrix A ∈\mathbbR^m \times n and a vector b ∈\mathbbR^m, find a vector x^* ∈\mathbbR^n that minimizes f(x) = \|A x - b\|_p. The widely-used \ell_2 regression, \emphi.e., linear least-squares, is known to be highly sensitive to outliers; and choosing p ∈[1, 2) can help improve robustness. In this work, we propose an efficient algorithm for solving strongly over-determined (m ≫n) robust \ell_p regression problems to moderate precision on MapReduce. Our empirical results on data up to the terabyte scale demonstrate that our algorithm is a significant improvement over traditional iterative algorithms on MapReduce for \ell_1 regression, even for a fairly small number of iterations. In addition, our proposed interior-point cutting-plane method can also be extended to solving more general convex problems on MapReduce.
RIS
TY - CPAPER TI - Robust Regression on MapReduce AU - Xiangrui Meng AU - Michael Mahoney BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-meng13b PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 888 EP - 896 L1 - http://proceedings.mlr.press/v28/meng13b.pdf UR - https://proceedings.mlr.press/v28/meng13b.html AB - Although the MapReduce framework is now the \emphde facto standard for analyzing massive data sets, many algorithms (in particular, many iterative algorithms popular in machine learning, optimization, and linear algebra) are hard to fit into MapReduce. Consider, \emphe.g., the \ell_p regression problem: given a matrix A ∈\mathbbR^m \times n and a vector b ∈\mathbbR^m, find a vector x^* ∈\mathbbR^n that minimizes f(x) = \|A x - b\|_p. The widely-used \ell_2 regression, \emphi.e., linear least-squares, is known to be highly sensitive to outliers; and choosing p ∈[1, 2) can help improve robustness. In this work, we propose an efficient algorithm for solving strongly over-determined (m ≫n) robust \ell_p regression problems to moderate precision on MapReduce. Our empirical results on data up to the terabyte scale demonstrate that our algorithm is a significant improvement over traditional iterative algorithms on MapReduce for \ell_1 regression, even for a fairly small number of iterations. In addition, our proposed interior-point cutting-plane method can also be extended to solving more general convex problems on MapReduce. ER -
APA
Meng, X. & Mahoney, M.. (2013). Robust Regression on MapReduce. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):888-896 Available from https://proceedings.mlr.press/v28/meng13b.html.

Related Material