Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data

Tomoya Sakai, Marthinus Christoffel Plessis, Gang Niu, Masashi Sugiyama
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:2998-3006, 2017.

Abstract

Most of the semi-supervised classification methods developed so far use unlabeled data for regularization purposes under particular distributional assumptions such as the cluster assumption. In contrast, recently developed methods of classification from positive and unlabeled data (PU classification) use unlabeled data for risk evaluation, i.e., label information is directly extracted from unlabeled data. In this paper, we extend PU classification to also incorporate negative data and propose a novel semi-supervised learning approach. We establish generalization error bounds for our novel methods and show that the bounds decrease with respect to the number of unlabeled data without the distributional assumptions that are required in existing semi-supervised learning methods. Through experiments, we demonstrate the usefulness of the proposed methods.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-sakai17a, title = {Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data}, author = {Tomoya Sakai and Marthinus Christoffel du Plessis and Gang Niu and Masashi Sugiyama}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {2998--3006}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/sakai17a/sakai17a.pdf}, url = {https://proceedings.mlr.press/v70/sakai17a.html}, abstract = {Most of the semi-supervised classification methods developed so far use unlabeled data for regularization purposes under particular distributional assumptions such as the cluster assumption. In contrast, recently developed methods of classification from positive and unlabeled data (PU classification) use unlabeled data for risk evaluation, i.e., label information is directly extracted from unlabeled data. In this paper, we extend PU classification to also incorporate negative data and propose a novel semi-supervised learning approach. We establish generalization error bounds for our novel methods and show that the bounds decrease with respect to the number of unlabeled data without the distributional assumptions that are required in existing semi-supervised learning methods. Through experiments, we demonstrate the usefulness of the proposed methods.} }
Endnote
%0 Conference Paper %T Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data %A Tomoya Sakai %A Marthinus Christoffel Plessis %A Gang Niu %A Masashi Sugiyama %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-sakai17a %I PMLR %P 2998--3006 %U https://proceedings.mlr.press/v70/sakai17a.html %V 70 %X Most of the semi-supervised classification methods developed so far use unlabeled data for regularization purposes under particular distributional assumptions such as the cluster assumption. In contrast, recently developed methods of classification from positive and unlabeled data (PU classification) use unlabeled data for risk evaluation, i.e., label information is directly extracted from unlabeled data. In this paper, we extend PU classification to also incorporate negative data and propose a novel semi-supervised learning approach. We establish generalization error bounds for our novel methods and show that the bounds decrease with respect to the number of unlabeled data without the distributional assumptions that are required in existing semi-supervised learning methods. Through experiments, we demonstrate the usefulness of the proposed methods.
APA
Sakai, T., Plessis, M.C., Niu, G. & Sugiyama, M.. (2017). Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:2998-3006 Available from https://proceedings.mlr.press/v70/sakai17a.html.

Related Material