[edit]
Multi-class Classification from Multiple Unlabeled Datasets with Partial Risk Regularization
Proceedings of The 14th Asian Conference on Machine
Learning, PMLR 189:990-1005, 2023.
Abstract
Recent years have witnessed a great success of
supervised deep learning, where predictive models
were trained from a large amount of fully labeled
data. However, in practice, labeling such big data
can be very costly and may not even be possible for
privacy reasons. Therefore, in this paper, we aim to
learn an accurate classifier without any class
labels. More specifically, we consider the case
where multiple sets of unlabeled data and only their
class priors, i.e., the proportions of each class,
are available. Under this problem setup, we first
derive an unbiased estimator of the classification
risk that can be estimated from the given unlabeled
sets and theoretically analyze the generalization
error of the learned classifier. We then find that
the classifier obtained as such tends to cause
overfitting as its empirical risks go negative
during training. To prevent overfitting, we further
propose a partial risk regularization that maintains
the partial risks with respect to unlabeled datasets
and classes to certain levels. Experiments
demonstrate that our method effectively mitigates
overfitting and outperforms state-of-the-art methods
for learning from multiple unlabeled sets.