Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift

Ananya Kumar, Tengyu Ma, Percy Liang, Aditi Raghunathan
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:1041-1051, 2022.

Abstract

We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy. A robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via vanilla ERM. In this paper, we find that a simple approach of ensembling the standard and robust models, after calibrating on only ID data, outperforms prior state-of-the-art both ID and OOD. On ten natural distribution shift datasets, ID-calibrated ensembles get the best of both worlds: strong ID accuracy of the standard model and OOD accuracy of the robust model. We analyze this method in stylized settings, and identify two important conditions for ensembles to perform well on both ID and OOD: (1) standard and robust models should be calibrated (on ID data, because OOD data is unavailable), (2) OOD has no anticorrelated spurious features.

Cite this Paper


BibTeX
@InProceedings{pmlr-v180-kumar22a, title = {Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift}, author = {Kumar, Ananya and Ma, Tengyu and Liang, Percy and Raghunathan, Aditi}, booktitle = {Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence}, pages = {1041--1051}, year = {2022}, editor = {Cussens, James and Zhang, Kun}, volume = {180}, series = {Proceedings of Machine Learning Research}, month = {01--05 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v180/kumar22a/kumar22a.pdf}, url = {https://proceedings.mlr.press/v180/kumar22a.html}, abstract = {We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy. A robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via vanilla ERM. In this paper, we find that a simple approach of ensembling the standard and robust models, after calibrating on only ID data, outperforms prior state-of-the-art both ID and OOD. On ten natural distribution shift datasets, ID-calibrated ensembles get the best of both worlds: strong ID accuracy of the standard model and OOD accuracy of the robust model. We analyze this method in stylized settings, and identify two important conditions for ensembles to perform well on both ID and OOD: (1) standard and robust models should be calibrated (on ID data, because OOD data is unavailable), (2) OOD has no anticorrelated spurious features.} }
Endnote
%0 Conference Paper %T Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift %A Ananya Kumar %A Tengyu Ma %A Percy Liang %A Aditi Raghunathan %B Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2022 %E James Cussens %E Kun Zhang %F pmlr-v180-kumar22a %I PMLR %P 1041--1051 %U https://proceedings.mlr.press/v180/kumar22a.html %V 180 %X We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy. A robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via vanilla ERM. In this paper, we find that a simple approach of ensembling the standard and robust models, after calibrating on only ID data, outperforms prior state-of-the-art both ID and OOD. On ten natural distribution shift datasets, ID-calibrated ensembles get the best of both worlds: strong ID accuracy of the standard model and OOD accuracy of the robust model. We analyze this method in stylized settings, and identify two important conditions for ensembles to perform well on both ID and OOD: (1) standard and robust models should be calibrated (on ID data, because OOD data is unavailable), (2) OOD has no anticorrelated spurious features.
APA
Kumar, A., Ma, T., Liang, P. & Raghunathan, A.. (2022). Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 180:1041-1051 Available from https://proceedings.mlr.press/v180/kumar22a.html.

Related Material