SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning

Chaoqun Du, Yizeng Han, Gao Huang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:11686-11703, 2024.

Abstract

Recent advancements in semi-supervised learning have focused on a more realistic yet challenging task: addressing imbalances in labeled data while the class distribution of unlabeled data remains both unknown and potentially mismatched. Current approaches in this sphere often presuppose rigid assumptions regarding the class distribution of unlabeled data, thereby limiting the adaptability of models to only certain distribution ranges. In this study, we propose a novel approach, introducing a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data. Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization (EM) method by separating the modeling of conditional and marginal class distributions. This separation facilitates a closed-form solution for class distribution estimation during the maximization phase, leading to the formulation of a Bayes classifier. The Bayes classifier, in turn, enhances the quality of pseudo-labels in the expectation phase. Remarkably, the SimPro framework is not only straightforward to implement but also comes with theoretical guarantees. Moreover, we introduce two novel class distributions broadening the scope of the evaluation. Our method showcases consistent state-of-the-art performance across diverse benchmarks and data distribution scenarios. benchmarks and data distribution scenarios. Our code is available at https://github.com/LeapLabTHU/SimPro.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-du24b, title = {{S}im{P}ro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning}, author = {Du, Chaoqun and Han, Yizeng and Huang, Gao}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {11686--11703}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/du24b/du24b.pdf}, url = {https://proceedings.mlr.press/v235/du24b.html}, abstract = {Recent advancements in semi-supervised learning have focused on a more realistic yet challenging task: addressing imbalances in labeled data while the class distribution of unlabeled data remains both unknown and potentially mismatched. Current approaches in this sphere often presuppose rigid assumptions regarding the class distribution of unlabeled data, thereby limiting the adaptability of models to only certain distribution ranges. In this study, we propose a novel approach, introducing a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data. Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization (EM) method by separating the modeling of conditional and marginal class distributions. This separation facilitates a closed-form solution for class distribution estimation during the maximization phase, leading to the formulation of a Bayes classifier. The Bayes classifier, in turn, enhances the quality of pseudo-labels in the expectation phase. Remarkably, the SimPro framework is not only straightforward to implement but also comes with theoretical guarantees. Moreover, we introduce two novel class distributions broadening the scope of the evaluation. Our method showcases consistent state-of-the-art performance across diverse benchmarks and data distribution scenarios. benchmarks and data distribution scenarios. Our code is available at https://github.com/LeapLabTHU/SimPro.} }
Endnote
%0 Conference Paper %T SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning %A Chaoqun Du %A Yizeng Han %A Gao Huang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-du24b %I PMLR %P 11686--11703 %U https://proceedings.mlr.press/v235/du24b.html %V 235 %X Recent advancements in semi-supervised learning have focused on a more realistic yet challenging task: addressing imbalances in labeled data while the class distribution of unlabeled data remains both unknown and potentially mismatched. Current approaches in this sphere often presuppose rigid assumptions regarding the class distribution of unlabeled data, thereby limiting the adaptability of models to only certain distribution ranges. In this study, we propose a novel approach, introducing a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data. Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization (EM) method by separating the modeling of conditional and marginal class distributions. This separation facilitates a closed-form solution for class distribution estimation during the maximization phase, leading to the formulation of a Bayes classifier. The Bayes classifier, in turn, enhances the quality of pseudo-labels in the expectation phase. Remarkably, the SimPro framework is not only straightforward to implement but also comes with theoretical guarantees. Moreover, we introduce two novel class distributions broadening the scope of the evaluation. Our method showcases consistent state-of-the-art performance across diverse benchmarks and data distribution scenarios. benchmarks and data distribution scenarios. Our code is available at https://github.com/LeapLabTHU/SimPro.
APA
Du, C., Han, Y. & Huang, G.. (2024). SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:11686-11703 Available from https://proceedings.mlr.press/v235/du24b.html.

Related Material