Quantile Regression for Large-scale Applications

Jiyan Yang, Xiangrui Meng, Michael Mahoney
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):881-887, 2013.

Abstract

Quantile regression is a method to estimate the quantiles of the conditional distribution of a response variable, and as such it permits a much more accurate portrayal of the relationship between the response variable and observed covariates than methods such as Least-squares or Least Absolute Deviations regression. It can be expressed as a linear program, and interior-point methods can be used to find a solution for moderately large problems. Dealing with very large problems, \emphe.g., involving data up to and beyond the terabyte regime, remains a challenge. Here, we present a randomized algorithm that runs in time that is nearly linear in the size of the input and that, with constant probability, computes a (1+ε) approximate solution to an arbitrary quantile regression problem. Our algorithm computes a low-distortion subspace-preserving embedding with respect to the loss function of quantile regression. Our empirical evaluation illustrates that our algorithm is competitive with the best previous work on small to medium-sized problems, and that it can be implemented in MapReduce-like environments and applied to terabyte-sized problems.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-yang13f, title = {Quantile Regression for Large-scale Applications}, author = {Yang, Jiyan and Meng, Xiangrui and Mahoney, Michael}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {881--887}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/yang13f.pdf}, url = {https://proceedings.mlr.press/v28/yang13f.html}, abstract = {Quantile regression is a method to estimate the quantiles of the conditional distribution of a response variable, and as such it permits a much more accurate portrayal of the relationship between the response variable and observed covariates than methods such as Least-squares or Least Absolute Deviations regression. It can be expressed as a linear program, and interior-point methods can be used to find a solution for moderately large problems. Dealing with very large problems, \emphe.g., involving data up to and beyond the terabyte regime, remains a challenge. Here, we present a randomized algorithm that runs in time that is nearly linear in the size of the input and that, with constant probability, computes a (1+ε) approximate solution to an arbitrary quantile regression problem. Our algorithm computes a low-distortion subspace-preserving embedding with respect to the loss function of quantile regression. Our empirical evaluation illustrates that our algorithm is competitive with the best previous work on small to medium-sized problems, and that it can be implemented in MapReduce-like environments and applied to terabyte-sized problems.} }
Endnote
%0 Conference Paper %T Quantile Regression for Large-scale Applications %A Jiyan Yang %A Xiangrui Meng %A Michael Mahoney %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-yang13f %I PMLR %P 881--887 %U https://proceedings.mlr.press/v28/yang13f.html %V 28 %N 3 %X Quantile regression is a method to estimate the quantiles of the conditional distribution of a response variable, and as such it permits a much more accurate portrayal of the relationship between the response variable and observed covariates than methods such as Least-squares or Least Absolute Deviations regression. It can be expressed as a linear program, and interior-point methods can be used to find a solution for moderately large problems. Dealing with very large problems, \emphe.g., involving data up to and beyond the terabyte regime, remains a challenge. Here, we present a randomized algorithm that runs in time that is nearly linear in the size of the input and that, with constant probability, computes a (1+ε) approximate solution to an arbitrary quantile regression problem. Our algorithm computes a low-distortion subspace-preserving embedding with respect to the loss function of quantile regression. Our empirical evaluation illustrates that our algorithm is competitive with the best previous work on small to medium-sized problems, and that it can be implemented in MapReduce-like environments and applied to terabyte-sized problems.
RIS
TY - CPAPER TI - Quantile Regression for Large-scale Applications AU - Jiyan Yang AU - Xiangrui Meng AU - Michael Mahoney BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-yang13f PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 881 EP - 887 L1 - http://proceedings.mlr.press/v28/yang13f.pdf UR - https://proceedings.mlr.press/v28/yang13f.html AB - Quantile regression is a method to estimate the quantiles of the conditional distribution of a response variable, and as such it permits a much more accurate portrayal of the relationship between the response variable and observed covariates than methods such as Least-squares or Least Absolute Deviations regression. It can be expressed as a linear program, and interior-point methods can be used to find a solution for moderately large problems. Dealing with very large problems, \emphe.g., involving data up to and beyond the terabyte regime, remains a challenge. Here, we present a randomized algorithm that runs in time that is nearly linear in the size of the input and that, with constant probability, computes a (1+ε) approximate solution to an arbitrary quantile regression problem. Our algorithm computes a low-distortion subspace-preserving embedding with respect to the loss function of quantile regression. Our empirical evaluation illustrates that our algorithm is competitive with the best previous work on small to medium-sized problems, and that it can be implemented in MapReduce-like environments and applied to terabyte-sized problems. ER -
APA
Yang, J., Meng, X. & Mahoney, M.. (2013). Quantile Regression for Large-scale Applications. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):881-887 Available from https://proceedings.mlr.press/v28/yang13f.html.

Related Material