Semi-supervised Group DRO: Combating Sparsity with Unlabeled Data

Pranjal Awasthi, Satyen Kale, Ankit Pensia
Proceedings of The 35th International Conference on Algorithmic Learning Theory, PMLR 237:125-160, 2024.

Abstract

In this work we formulate the problem of group distributionally robust optimization (DRO) in a semi-supervised setting. Motivated by applications in robustness and fairness, the goal in group DRO is to learn a hypothesis that minimizes the worst case performance over a pre-specified set of groups defined over the data distribution. In contrast to existing work that assumes access to labeled data from each of the groups, we consider the practical setting where many groups may have little to no amount of labeled data. We design near optimal learning algorithms in this setting by leveraging the unlabeled data from different groups. The performance of our algorithms can be characterized in terms of a natural quantity that captures the similarity among the various groups and the maximum best-in-class error among the groups. Furthermore, for the special case of squared loss and a convex function class we show that the dependence on the best-in-class error can be avoided. We also derive sample complexity bounds for our proposed semi-supervised algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v237-awasthi24a, title = {Semi-supervised Group DRO: Combating Sparsity with Unlabeled Data}, author = {Awasthi, Pranjal and Kale, Satyen and Pensia, Ankit}, booktitle = {Proceedings of The 35th International Conference on Algorithmic Learning Theory}, pages = {125--160}, year = {2024}, editor = {Vernade, Claire and Hsu, Daniel}, volume = {237}, series = {Proceedings of Machine Learning Research}, month = {25--28 Feb}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v237/awasthi24a/awasthi24a.pdf}, url = {https://proceedings.mlr.press/v237/awasthi24a.html}, abstract = {In this work we formulate the problem of group distributionally robust optimization (DRO) in a semi-supervised setting. Motivated by applications in robustness and fairness, the goal in group DRO is to learn a hypothesis that minimizes the worst case performance over a pre-specified set of groups defined over the data distribution. In contrast to existing work that assumes access to labeled data from each of the groups, we consider the practical setting where many groups may have little to no amount of labeled data. We design near optimal learning algorithms in this setting by leveraging the unlabeled data from different groups. The performance of our algorithms can be characterized in terms of a natural quantity that captures the similarity among the various groups and the maximum best-in-class error among the groups. Furthermore, for the special case of squared loss and a convex function class we show that the dependence on the best-in-class error can be avoided. We also derive sample complexity bounds for our proposed semi-supervised algorithm.} }
Endnote
%0 Conference Paper %T Semi-supervised Group DRO: Combating Sparsity with Unlabeled Data %A Pranjal Awasthi %A Satyen Kale %A Ankit Pensia %B Proceedings of The 35th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Claire Vernade %E Daniel Hsu %F pmlr-v237-awasthi24a %I PMLR %P 125--160 %U https://proceedings.mlr.press/v237/awasthi24a.html %V 237 %X In this work we formulate the problem of group distributionally robust optimization (DRO) in a semi-supervised setting. Motivated by applications in robustness and fairness, the goal in group DRO is to learn a hypothesis that minimizes the worst case performance over a pre-specified set of groups defined over the data distribution. In contrast to existing work that assumes access to labeled data from each of the groups, we consider the practical setting where many groups may have little to no amount of labeled data. We design near optimal learning algorithms in this setting by leveraging the unlabeled data from different groups. The performance of our algorithms can be characterized in terms of a natural quantity that captures the similarity among the various groups and the maximum best-in-class error among the groups. Furthermore, for the special case of squared loss and a convex function class we show that the dependence on the best-in-class error can be avoided. We also derive sample complexity bounds for our proposed semi-supervised algorithm.
APA
Awasthi, P., Kale, S. & Pensia, A.. (2024). Semi-supervised Group DRO: Combating Sparsity with Unlabeled Data. Proceedings of The 35th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 237:125-160 Available from https://proceedings.mlr.press/v237/awasthi24a.html.

Related Material