Omnipredictors for regression and the approximate rank of convex functions

Parikshit Gopalan, Princewill Okoroafor, Prasad Raghavendra, Abhishek Sherry, Mihir Singhal
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:2027-2070, 2024.

Abstract

Consider the supervised learning setting where the goal is to learn to predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An \textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class $\mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$. Since the work of Gopalan et al. (2021) that introduced the notion, there has been a large body of work in the setting of binary labels where $\mathbf y \in \{0, 1\}$, but much less is known about the regression setting where $\mathbf y \in [0,1]$ can be continuous. The naive generalization of the previous approaches to regression is to predict the probability distribution of $y$, discretized to $\varepsilon$-width intervals. The running time would be exponential in the size of the output of the omnipredictor, which is $1/\varepsilon$. Our main conceptual contribution is the notion of \textit{sufficient statistics} for loss minimization over a family of loss functions: these are a set of statistics about a distribution such that knowing them allows one to take actions that minimize the expected loss for any loss in the family. The notion of sufficient statistics relates directly to the approximate rank of the family of loss functions. Thus, improved bounds on the latter yield improved runtimes for learning omnipredictors. Our key technical contribution is a bound of $O(1/\varepsilon^{2/3})$ on the $\epsilon$-approximate rank of convex, Lipschitz functions on the interval $[0,1]$, which we show is tight up to a factor of $\mathrm{polylog} (1/\epsilon)$. This yields improved runtimes for learning omnipredictors for the class of all convex, Lipschitz loss functions under weak learnability assumptions about the class $\mathcal C$. We also give efficient omnipredictors when the loss families have low-degree polynomial approximations, or arise from generalized linear models (GLMs). This translation from sufficient statistics to faster omnipredictors is made possible by lifting the technique of loss outcome indistinguishability introduced by Gopalan et al. (2023a) for Boolean labels to the regression setting.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-gopalan24b, title = {Omnipredictors for regression and the approximate rank of convex functions}, author = {Gopalan, Parikshit and Okoroafor, Princewill and Raghavendra, Prasad and Sherry, Abhishek and Singhal, Mihir}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {2027--2070}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/gopalan24b/gopalan24b.pdf}, url = {https://proceedings.mlr.press/v247/gopalan24b.html}, abstract = {Consider the supervised learning setting where the goal is to learn to predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An \textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class $\mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$. Since the work of Gopalan et al. (2021) that introduced the notion, there has been a large body of work in the setting of binary labels where $\mathbf y \in \{0, 1\}$, but much less is known about the regression setting where $\mathbf y \in [0,1]$ can be continuous. The naive generalization of the previous approaches to regression is to predict the probability distribution of $y$, discretized to $\varepsilon$-width intervals. The running time would be exponential in the size of the output of the omnipredictor, which is $1/\varepsilon$. Our main conceptual contribution is the notion of \textit{sufficient statistics} for loss minimization over a family of loss functions: these are a set of statistics about a distribution such that knowing them allows one to take actions that minimize the expected loss for any loss in the family. The notion of sufficient statistics relates directly to the approximate rank of the family of loss functions. Thus, improved bounds on the latter yield improved runtimes for learning omnipredictors. Our key technical contribution is a bound of $O(1/\varepsilon^{2/3})$ on the $\epsilon$-approximate rank of convex, Lipschitz functions on the interval $[0,1]$, which we show is tight up to a factor of $\mathrm{polylog} (1/\epsilon)$. This yields improved runtimes for learning omnipredictors for the class of all convex, Lipschitz loss functions under weak learnability assumptions about the class $\mathcal C$. We also give efficient omnipredictors when the loss families have low-degree polynomial approximations, or arise from generalized linear models (GLMs). This translation from sufficient statistics to faster omnipredictors is made possible by lifting the technique of loss outcome indistinguishability introduced by Gopalan et al. (2023a) for Boolean labels to the regression setting.} }
Endnote
%0 Conference Paper %T Omnipredictors for regression and the approximate rank of convex functions %A Parikshit Gopalan %A Princewill Okoroafor %A Prasad Raghavendra %A Abhishek Sherry %A Mihir Singhal %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-gopalan24b %I PMLR %P 2027--2070 %U https://proceedings.mlr.press/v247/gopalan24b.html %V 247 %X Consider the supervised learning setting where the goal is to learn to predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An \textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class $\mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$. Since the work of Gopalan et al. (2021) that introduced the notion, there has been a large body of work in the setting of binary labels where $\mathbf y \in \{0, 1\}$, but much less is known about the regression setting where $\mathbf y \in [0,1]$ can be continuous. The naive generalization of the previous approaches to regression is to predict the probability distribution of $y$, discretized to $\varepsilon$-width intervals. The running time would be exponential in the size of the output of the omnipredictor, which is $1/\varepsilon$. Our main conceptual contribution is the notion of \textit{sufficient statistics} for loss minimization over a family of loss functions: these are a set of statistics about a distribution such that knowing them allows one to take actions that minimize the expected loss for any loss in the family. The notion of sufficient statistics relates directly to the approximate rank of the family of loss functions. Thus, improved bounds on the latter yield improved runtimes for learning omnipredictors. Our key technical contribution is a bound of $O(1/\varepsilon^{2/3})$ on the $\epsilon$-approximate rank of convex, Lipschitz functions on the interval $[0,1]$, which we show is tight up to a factor of $\mathrm{polylog} (1/\epsilon)$. This yields improved runtimes for learning omnipredictors for the class of all convex, Lipschitz loss functions under weak learnability assumptions about the class $\mathcal C$. We also give efficient omnipredictors when the loss families have low-degree polynomial approximations, or arise from generalized linear models (GLMs). This translation from sufficient statistics to faster omnipredictors is made possible by lifting the technique of loss outcome indistinguishability introduced by Gopalan et al. (2023a) for Boolean labels to the regression setting.
APA
Gopalan, P., Okoroafor, P., Raghavendra, P., Sherry, A. & Singhal, M.. (2024). Omnipredictors for regression and the approximate rank of convex functions. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:2027-2070 Available from https://proceedings.mlr.press/v247/gopalan24b.html.

Related Material