[edit]
Fairness-Enhancing Data Augmentation Methods for Worst-Group Accuracy
Proceedings of the Algorithmic Fairness Through the Lens of Metrics and Evaluation, PMLR 279:156-172, 2025.
Abstract
Ensuring fair predictions across many distinct subpopulations in the training data canbe prohibitive for large models. Recently, simple linear last layer retraining strategies,in combination with data augmentation methods such as upweighting and downsamplinghave been shown to achieve state-of-the-art performance for worst-group accuracy, whichquantifies accuracy for the least prevalent subpopulation. For linear last layer retraining andthe abovementioned augmentations, we present a comparison of the optimal worst-groupaccuracy when modeling the distribution of the latent representations (input to the last layer)as Gaussian for each subpopulation. Observing that these augmentation techniques relyheavily on well-labeled subpopulations, we present a comparison of the optimal worst-groupaccuracy in the setting of label noise. We verify our results for both synthetic and largepublicly available datasets.