Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification

Nan Lu, Shida Lei, Gang Niu, Issei Sato, Masashi Sugiyama
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7134-7144, 2021.

Abstract

To cope with high annotation costs, training a classifier only from weakly supervised data has attracted a great deal of attention these days. Among various approaches, strengthening supervision from completely unsupervised classification is a promising direction, which typically employs class priors as the only supervision and trains a binary classifier from unlabeled (U) datasets. While existing risk-consistent methods are theoretically grounded with high flexibility, they can learn only from two U sets. In this paper, we propose a new approach for binary classification from $m$ U-sets for $m\ge2$. Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC), which is aimed at predicting from which U set each observed sample is drawn. SSC can be solved by a standard (multi-class) classification method, and we use the SSC solution to obtain the final binary classifier through a certain linear-fractional transformation. We built our method in a flexible and efficient end-to-end deep learning framework and prove it to be classifier-consistent. Through experiments, we demonstrate the superiority of our proposed method over state-of-the-art methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-lu21c, title = {Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification}, author = {Lu, Nan and Lei, Shida and Niu, Gang and Sato, Issei and Sugiyama, Masashi}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {7134--7144}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/lu21c/lu21c.pdf}, url = {https://proceedings.mlr.press/v139/lu21c.html}, abstract = {To cope with high annotation costs, training a classifier only from weakly supervised data has attracted a great deal of attention these days. Among various approaches, strengthening supervision from completely unsupervised classification is a promising direction, which typically employs class priors as the only supervision and trains a binary classifier from unlabeled (U) datasets. While existing risk-consistent methods are theoretically grounded with high flexibility, they can learn only from two U sets. In this paper, we propose a new approach for binary classification from $m$ U-sets for $m\ge2$. Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC), which is aimed at predicting from which U set each observed sample is drawn. SSC can be solved by a standard (multi-class) classification method, and we use the SSC solution to obtain the final binary classifier through a certain linear-fractional transformation. We built our method in a flexible and efficient end-to-end deep learning framework and prove it to be classifier-consistent. Through experiments, we demonstrate the superiority of our proposed method over state-of-the-art methods.} }
Endnote
%0 Conference Paper %T Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification %A Nan Lu %A Shida Lei %A Gang Niu %A Issei Sato %A Masashi Sugiyama %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-lu21c %I PMLR %P 7134--7144 %U https://proceedings.mlr.press/v139/lu21c.html %V 139 %X To cope with high annotation costs, training a classifier only from weakly supervised data has attracted a great deal of attention these days. Among various approaches, strengthening supervision from completely unsupervised classification is a promising direction, which typically employs class priors as the only supervision and trains a binary classifier from unlabeled (U) datasets. While existing risk-consistent methods are theoretically grounded with high flexibility, they can learn only from two U sets. In this paper, we propose a new approach for binary classification from $m$ U-sets for $m\ge2$. Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC), which is aimed at predicting from which U set each observed sample is drawn. SSC can be solved by a standard (multi-class) classification method, and we use the SSC solution to obtain the final binary classifier through a certain linear-fractional transformation. We built our method in a flexible and efficient end-to-end deep learning framework and prove it to be classifier-consistent. Through experiments, we demonstrate the superiority of our proposed method over state-of-the-art methods.
APA
Lu, N., Lei, S., Niu, G., Sato, I. & Sugiyama, M.. (2021). Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7134-7144 Available from https://proceedings.mlr.press/v139/lu21c.html.

Related Material