Classification from Positive, Unlabeled and Biased Negative Data

Yu-Guan Hsieh, Gang Niu, Masashi Sugiyama
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2820-2829, 2019.

Abstract

In binary classification, there are situations where negative (N) data are too diverse to be fully labeled and we often resort to positive-unlabeled (PU) learning in these scenarios. However, collecting a non-representative N set that contains only a small portion of all possible N data can often be much easier in practice. This paper studies a novel classification framework which incorporates such biased N (bN) data in PU learning. We provide a method based on empirical risk minimization to address this PUbN classification problem. Our approach can be regarded as a novel example-weighting algorithm, with the weight of each example computed through a preliminary step that draws inspiration from PU learning. We also derive an estimation error bound for the proposed method. Experimental results demonstrate the effectiveness of our algorithm in not only PUbN learning scenarios but also ordinary PU learning scenarios on several benchmark datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-hsieh19c, title = {Classification from Positive, Unlabeled and Biased Negative Data}, author = {Hsieh, Yu-Guan and Niu, Gang and Sugiyama, Masashi}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {2820--2829}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/hsieh19c/hsieh19c.pdf}, url = {https://proceedings.mlr.press/v97/hsieh19c.html}, abstract = {In binary classification, there are situations where negative (N) data are too diverse to be fully labeled and we often resort to positive-unlabeled (PU) learning in these scenarios. However, collecting a non-representative N set that contains only a small portion of all possible N data can often be much easier in practice. This paper studies a novel classification framework which incorporates such biased N (bN) data in PU learning. We provide a method based on empirical risk minimization to address this PUbN classification problem. Our approach can be regarded as a novel example-weighting algorithm, with the weight of each example computed through a preliminary step that draws inspiration from PU learning. We also derive an estimation error bound for the proposed method. Experimental results demonstrate the effectiveness of our algorithm in not only PUbN learning scenarios but also ordinary PU learning scenarios on several benchmark datasets.} }
Endnote
%0 Conference Paper %T Classification from Positive, Unlabeled and Biased Negative Data %A Yu-Guan Hsieh %A Gang Niu %A Masashi Sugiyama %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-hsieh19c %I PMLR %P 2820--2829 %U https://proceedings.mlr.press/v97/hsieh19c.html %V 97 %X In binary classification, there are situations where negative (N) data are too diverse to be fully labeled and we often resort to positive-unlabeled (PU) learning in these scenarios. However, collecting a non-representative N set that contains only a small portion of all possible N data can often be much easier in practice. This paper studies a novel classification framework which incorporates such biased N (bN) data in PU learning. We provide a method based on empirical risk minimization to address this PUbN classification problem. Our approach can be regarded as a novel example-weighting algorithm, with the weight of each example computed through a preliminary step that draws inspiration from PU learning. We also derive an estimation error bound for the proposed method. Experimental results demonstrate the effectiveness of our algorithm in not only PUbN learning scenarios but also ordinary PU learning scenarios on several benchmark datasets.
APA
Hsieh, Y., Niu, G. & Sugiyama, M.. (2019). Classification from Positive, Unlabeled and Biased Negative Data. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:2820-2829 Available from https://proceedings.mlr.press/v97/hsieh19c.html.

Related Material