Simple data balancing achieves competitive worst-group-accuracy

Badr Youbi Idrissi, Martin Arjovsky, Mohammad Pezeshki, David Lopez-Paz
Proceedings of the First Conference on Causal Learning and Reasoning, PMLR 177:336-351, 2022.

Abstract

We study the problem of learning classifiers that perform well across (known or unknown) groups of data. After observing that common worst-group-accuracy datasets suffer from substantial imbalances, we set out to compare state-of-the-art methods to simple balancing of classes and groups by either subsampling or reweighting data. Our results show that these data balancing baselines achieve state-of-the-art-accuracy, while being faster to train and requiring no additional hyper-parameters. Finally, we highlight that access to group information is most critical for model selection purposes, and not so much during training. All in all, our findings beg closer examination of both benchmarks and methods for future research in worst-group-accuracy optimization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v177-idrissi22a, title = {Simple data balancing achieves competitive worst-group-accuracy}, author = {Idrissi, Badr Youbi and Arjovsky, Martin and Pezeshki, Mohammad and Lopez-Paz, David}, booktitle = {Proceedings of the First Conference on Causal Learning and Reasoning}, pages = {336--351}, year = {2022}, editor = {Schölkopf, Bernhard and Uhler, Caroline and Zhang, Kun}, volume = {177}, series = {Proceedings of Machine Learning Research}, month = {11--13 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v177/idrissi22a/idrissi22a.pdf}, url = {https://proceedings.mlr.press/v177/idrissi22a.html}, abstract = {We study the problem of learning classifiers that perform well across (known or unknown) groups of data. After observing that common worst-group-accuracy datasets suffer from substantial imbalances, we set out to compare state-of-the-art methods to simple balancing of classes and groups by either subsampling or reweighting data. Our results show that these data balancing baselines achieve state-of-the-art-accuracy, while being faster to train and requiring no additional hyper-parameters. Finally, we highlight that access to group information is most critical for model selection purposes, and not so much during training. All in all, our findings beg closer examination of both benchmarks and methods for future research in worst-group-accuracy optimization.} }
Endnote
%0 Conference Paper %T Simple data balancing achieves competitive worst-group-accuracy %A Badr Youbi Idrissi %A Martin Arjovsky %A Mohammad Pezeshki %A David Lopez-Paz %B Proceedings of the First Conference on Causal Learning and Reasoning %C Proceedings of Machine Learning Research %D 2022 %E Bernhard Schölkopf %E Caroline Uhler %E Kun Zhang %F pmlr-v177-idrissi22a %I PMLR %P 336--351 %U https://proceedings.mlr.press/v177/idrissi22a.html %V 177 %X We study the problem of learning classifiers that perform well across (known or unknown) groups of data. After observing that common worst-group-accuracy datasets suffer from substantial imbalances, we set out to compare state-of-the-art methods to simple balancing of classes and groups by either subsampling or reweighting data. Our results show that these data balancing baselines achieve state-of-the-art-accuracy, while being faster to train and requiring no additional hyper-parameters. Finally, we highlight that access to group information is most critical for model selection purposes, and not so much during training. All in all, our findings beg closer examination of both benchmarks and methods for future research in worst-group-accuracy optimization.
APA
Idrissi, B.Y., Arjovsky, M., Pezeshki, M. & Lopez-Paz, D.. (2022). Simple data balancing achieves competitive worst-group-accuracy. Proceedings of the First Conference on Causal Learning and Reasoning, in Proceedings of Machine Learning Research 177:336-351 Available from https://proceedings.mlr.press/v177/idrissi22a.html.

Related Material