Conditional Linear Regression

Diego Calderon, Brendan Juba, Sirui Li, Zongyi Li, Lisa Ruan
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:2164-2173, 2020.

Abstract

Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there may not exist a good, simple model for the distribution, so we seek to find a small subset where there exists such a model. We give a computationally efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant portion of the data distribution, described by a k-DNF, along with a linear predictor on that portion with a small loss. In contrast to work in robust statistics on small subsets, our loss bounds do not feature a dependence on the density of the portion we fit, and compared to previous work on conditional linear regression, our algorithm’s running time scales polynomially with the sparsity of the linear predictor. We also demonstrate empirically that our algorithm can leverage this advantage to obtain a k-DNF with a better linear predictor in practice.

Cite this Paper


BibTeX
@InProceedings{pmlr-v108-calderon20a, title = {Conditional Linear Regression}, author = {Calderon, Diego and Juba, Brendan and Li, Sirui and Li, Zongyi and Ruan, Lisa}, booktitle = {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics}, pages = {2164--2173}, year = {2020}, editor = {Chiappa, Silvia and Calandra, Roberto}, volume = {108}, series = {Proceedings of Machine Learning Research}, month = {26--28 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v108/calderon20a/calderon20a.pdf}, url = {https://proceedings.mlr.press/v108/calderon20a.html}, abstract = {Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there may not exist a good, simple model for the distribution, so we seek to find a small subset where there exists such a model. We give a computationally efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant portion of the data distribution, described by a k-DNF, along with a linear predictor on that portion with a small loss. In contrast to work in robust statistics on small subsets, our loss bounds do not feature a dependence on the density of the portion we fit, and compared to previous work on conditional linear regression, our algorithm’s running time scales polynomially with the sparsity of the linear predictor. We also demonstrate empirically that our algorithm can leverage this advantage to obtain a k-DNF with a better linear predictor in practice.} }
Endnote
%0 Conference Paper %T Conditional Linear Regression %A Diego Calderon %A Brendan Juba %A Sirui Li %A Zongyi Li %A Lisa Ruan %B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2020 %E Silvia Chiappa %E Roberto Calandra %F pmlr-v108-calderon20a %I PMLR %P 2164--2173 %U https://proceedings.mlr.press/v108/calderon20a.html %V 108 %X Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there may not exist a good, simple model for the distribution, so we seek to find a small subset where there exists such a model. We give a computationally efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant portion of the data distribution, described by a k-DNF, along with a linear predictor on that portion with a small loss. In contrast to work in robust statistics on small subsets, our loss bounds do not feature a dependence on the density of the portion we fit, and compared to previous work on conditional linear regression, our algorithm’s running time scales polynomially with the sparsity of the linear predictor. We also demonstrate empirically that our algorithm can leverage this advantage to obtain a k-DNF with a better linear predictor in practice.
APA
Calderon, D., Juba, B., Li, S., Li, Z. & Ruan, L.. (2020). Conditional Linear Regression. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:2164-2173 Available from https://proceedings.mlr.press/v108/calderon20a.html.

Related Material