[edit]
Semi-supervised Group DRO: Combating Sparsity with Unlabeled Data
Proceedings of The 35th International Conference on Algorithmic Learning Theory, PMLR 237:125-160, 2024.
Abstract
In this work we formulate the problem of group distributionally robust optimization (DRO) in a semi-supervised setting. Motivated by applications in robustness and fairness, the goal in group DRO is to learn a hypothesis that minimizes the worst case performance over a pre-specified set of groups defined over the data distribution. In contrast to existing work that assumes access to labeled data from each of the groups, we consider the practical setting where many groups may have little to no amount of labeled data. We design near optimal learning algorithms in this setting by leveraging the unlabeled data from different groups. The performance of our algorithms can be characterized in terms of a natural quantity that captures the similarity among the various groups and the maximum best-in-class error among the groups. Furthermore, for the special case of squared loss and a convex function class we show that the dependence on the best-in-class error can be avoided. We also derive sample complexity bounds for our proposed semi-supervised algorithm.