Conditional Sparse $L_p$-norm Regression With Optimal Probability

John Hainline, Brendan Juba, Hai S. Le, David Woodruff
Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89:1042-1050, 2019.

Abstract

We consider the following conditional linear regression problem: the task is to identify both (i) a $k$-DNF condition $c$ and (ii) a linear rule $f$ such that the probability of $c$ is (approximately) at least some given bound $\mu$, and minimizing the $l_p$ loss of $f$ at predicting the target $z$ in the distribution conditioned on $c$. Thus, the task is to identify a portion of the distribution on which a linear rule can provide a good fit. Algorithms for this task are useful in cases where portions of the distribution are not modeled well by simple, learnable rules, but on other portions such rules perform well. The prior state-of-the-art for such algorithms could only guarantee finding a condition of probability $O(\mu/n^k )$ when a condition of probability $\mu$ exists, and achieved a $O(n^k)$-approximation to the target loss. Here, we give efficient algorithms for solving this task with a condition $c$ that nearly matches the probability of the ideal condition, while also improving the approximation to the target loss to a $O  (n^{k/2})$ factor. We also give an algorithm for finding a k-DNF reference class for prediction at a given query point, that obtains a sparse regression fit that has loss within $O(n^k)$ of optimal among all sparse regression parameters and sufficiently large $k$-DNF reference classes containing the query point.

Cite this Paper


BibTeX
@InProceedings{pmlr-v89-hainline19a, title = {Conditional Sparse $L_p$-norm Regression With Optimal Probability}, author = {Hainline, John and Juba, Brendan and Le, Hai S. and Woodruff, David}, booktitle = {Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics}, pages = {1042--1050}, year = {2019}, editor = {Chaudhuri, Kamalika and Sugiyama, Masashi}, volume = {89}, series = {Proceedings of Machine Learning Research}, month = {16--18 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v89/hainline19a/hainline19a.pdf}, url = {https://proceedings.mlr.press/v89/hainline19a.html}, abstract = {We consider the following conditional linear regression problem: the task is to identify both (i) a $k$-DNF condition $c$ and (ii) a linear rule $f$ such that the probability of $c$ is (approximately) at least some given bound $\mu$, and minimizing the $l_p$ loss of $f$ at predicting the target $z$ in the distribution conditioned on $c$. Thus, the task is to identify a portion of the distribution on which a linear rule can provide a good fit. Algorithms for this task are useful in cases where portions of the distribution are not modeled well by simple, learnable rules, but on other portions such rules perform well. The prior state-of-the-art for such algorithms could only guarantee finding a condition of probability $O(\mu/n^k )$ when a condition of probability $\mu$ exists, and achieved a $O(n^k)$-approximation to the target loss. Here, we give efficient algorithms for solving this task with a condition $c$ that nearly matches the probability of the ideal condition, while also improving the approximation to the target loss to a $O  (n^{k/2})$ factor. We also give an algorithm for finding a k-DNF reference class for prediction at a given query point, that obtains a sparse regression fit that has loss within $O(n^k)$ of optimal among all sparse regression parameters and sufficiently large $k$-DNF reference classes containing the query point.} }
Endnote
%0 Conference Paper %T Conditional Sparse $L_p$-norm Regression With Optimal Probability %A John Hainline %A Brendan Juba %A Hai S. Le %A David Woodruff %B Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Masashi Sugiyama %F pmlr-v89-hainline19a %I PMLR %P 1042--1050 %U https://proceedings.mlr.press/v89/hainline19a.html %V 89 %X We consider the following conditional linear regression problem: the task is to identify both (i) a $k$-DNF condition $c$ and (ii) a linear rule $f$ such that the probability of $c$ is (approximately) at least some given bound $\mu$, and minimizing the $l_p$ loss of $f$ at predicting the target $z$ in the distribution conditioned on $c$. Thus, the task is to identify a portion of the distribution on which a linear rule can provide a good fit. Algorithms for this task are useful in cases where portions of the distribution are not modeled well by simple, learnable rules, but on other portions such rules perform well. The prior state-of-the-art for such algorithms could only guarantee finding a condition of probability $O(\mu/n^k )$ when a condition of probability $\mu$ exists, and achieved a $O(n^k)$-approximation to the target loss. Here, we give efficient algorithms for solving this task with a condition $c$ that nearly matches the probability of the ideal condition, while also improving the approximation to the target loss to a $O  (n^{k/2})$ factor. We also give an algorithm for finding a k-DNF reference class for prediction at a given query point, that obtains a sparse regression fit that has loss within $O(n^k)$ of optimal among all sparse regression parameters and sufficiently large $k$-DNF reference classes containing the query point.
APA
Hainline, J., Juba, B., Le, H.S. & Woodruff, D.. (2019). Conditional Sparse $L_p$-norm Regression With Optimal Probability. Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 89:1042-1050 Available from https://proceedings.mlr.press/v89/hainline19a.html.

Related Material