Misspecification in Prediction Problems and Robustness via Improper Learning

Annie Marsden, John Duchi, Gregory Valiant
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2161-2169, 2021.

Abstract

We study probabilistic prediction games when the underlying model is misspecified, investigating the consequences of predicting using an incorrect parametric model. We show that for a broad class of loss functions and parametric families of distributions, the regret of playing a “proper” predictor—one from the putative model class—relative to the best predictor in the same model class has lower bound scaling at least as $\sqrt{\gamma n}$, where $\gamma$ is a measure of the model misspecification to the true distribution in terms of total variation distance. In contrast, using an aggregation-based (improper) learner, one can obtain regret $d \log n$ for any underlying generating distribution, where $d$ is the dimension of the parameter; we exhibit instances in which this is unimprovable even over the family of all learners that may play distributions in the convex hull of the parametric family. These results suggest that simple strategies for aggregating multiple learners together should be more robust, and several experiments conform to this hypothesis.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-marsden21a, title = { Misspecification in Prediction Problems and Robustness via Improper Learning }, author = {Marsden, Annie and Duchi, John and Valiant, Gregory}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {2161--2169}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/marsden21a/marsden21a.pdf}, url = {https://proceedings.mlr.press/v130/marsden21a.html}, abstract = { We study probabilistic prediction games when the underlying model is misspecified, investigating the consequences of predicting using an incorrect parametric model. We show that for a broad class of loss functions and parametric families of distributions, the regret of playing a “proper” predictor—one from the putative model class—relative to the best predictor in the same model class has lower bound scaling at least as $\sqrt{\gamma n}$, where $\gamma$ is a measure of the model misspecification to the true distribution in terms of total variation distance. In contrast, using an aggregation-based (improper) learner, one can obtain regret $d \log n$ for any underlying generating distribution, where $d$ is the dimension of the parameter; we exhibit instances in which this is unimprovable even over the family of all learners that may play distributions in the convex hull of the parametric family. These results suggest that simple strategies for aggregating multiple learners together should be more robust, and several experiments conform to this hypothesis. } }
Endnote
%0 Conference Paper %T Misspecification in Prediction Problems and Robustness via Improper Learning %A Annie Marsden %A John Duchi %A Gregory Valiant %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-marsden21a %I PMLR %P 2161--2169 %U https://proceedings.mlr.press/v130/marsden21a.html %V 130 %X We study probabilistic prediction games when the underlying model is misspecified, investigating the consequences of predicting using an incorrect parametric model. We show that for a broad class of loss functions and parametric families of distributions, the regret of playing a “proper” predictor—one from the putative model class—relative to the best predictor in the same model class has lower bound scaling at least as $\sqrt{\gamma n}$, where $\gamma$ is a measure of the model misspecification to the true distribution in terms of total variation distance. In contrast, using an aggregation-based (improper) learner, one can obtain regret $d \log n$ for any underlying generating distribution, where $d$ is the dimension of the parameter; we exhibit instances in which this is unimprovable even over the family of all learners that may play distributions in the convex hull of the parametric family. These results suggest that simple strategies for aggregating multiple learners together should be more robust, and several experiments conform to this hypothesis.
APA
Marsden, A., Duchi, J. & Valiant, G.. (2021). Misspecification in Prediction Problems and Robustness via Improper Learning . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:2161-2169 Available from https://proceedings.mlr.press/v130/marsden21a.html.

Related Material