[edit]
Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, PMLR 180:1041-1051, 2022.
Abstract
We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy. A robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via vanilla ERM. In this paper, we find that a simple approach of ensembling the standard and robust models, after calibrating on only ID data, outperforms prior state-of-the-art both ID and OOD. On ten natural distribution shift datasets, ID-calibrated ensembles get the best of both worlds: strong ID accuracy of the standard model and OOD accuracy of the robust model. We analyze this method in stylized settings, and identify two important conditions for ensembles to perform well on both ID and OOD: (1) standard and robust models should be calibrated (on ID data, because OOD data is unavailable), (2) OOD has no anticorrelated spurious features.