Class-prior Estimation for Learning from Positive and Unlabeled Data

Marthinus Christoffel, Gang Niu, Masashi Sugiyama
Asian Conference on Machine Learning, PMLR 45:221-236, 2016.

Abstract

We consider the problem of estimating the \emphclass prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized L_1-distance gives a computationally efficient algorithm with an analytic solution, and establish its uniform deviation bound and estimation error bound. Finally, we experimentally demonstrate the usefulness of the proposed method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v45-Christoffel15, title = {Class-prior Estimation for Learning from Positive and Unlabeled Data}, author = {Christoffel, Marthinus and Niu, Gang and Sugiyama, Masashi}, booktitle = {Asian Conference on Machine Learning}, pages = {221--236}, year = {2016}, editor = {Holmes, Geoffrey and Liu, Tie-Yan}, volume = {45}, series = {Proceedings of Machine Learning Research}, address = {Hong Kong}, month = {20--22 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v45/Christoffel15.pdf}, url = {https://proceedings.mlr.press/v45/Christoffel15.html}, abstract = {We consider the problem of estimating the \emphclass prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized L_1-distance gives a computationally efficient algorithm with an analytic solution, and establish its uniform deviation bound and estimation error bound. Finally, we experimentally demonstrate the usefulness of the proposed method.} }
Endnote
%0 Conference Paper %T Class-prior Estimation for Learning from Positive and Unlabeled Data %A Marthinus Christoffel %A Gang Niu %A Masashi Sugiyama %B Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Geoffrey Holmes %E Tie-Yan Liu %F pmlr-v45-Christoffel15 %I PMLR %P 221--236 %U https://proceedings.mlr.press/v45/Christoffel15.html %V 45 %X We consider the problem of estimating the \emphclass prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized L_1-distance gives a computationally efficient algorithm with an analytic solution, and establish its uniform deviation bound and estimation error bound. Finally, we experimentally demonstrate the usefulness of the proposed method.
RIS
TY - CPAPER TI - Class-prior Estimation for Learning from Positive and Unlabeled Data AU - Marthinus Christoffel AU - Gang Niu AU - Masashi Sugiyama BT - Asian Conference on Machine Learning DA - 2016/02/25 ED - Geoffrey Holmes ED - Tie-Yan Liu ID - pmlr-v45-Christoffel15 PB - PMLR DP - Proceedings of Machine Learning Research VL - 45 SP - 221 EP - 236 L1 - http://proceedings.mlr.press/v45/Christoffel15.pdf UR - https://proceedings.mlr.press/v45/Christoffel15.html AB - We consider the problem of estimating the \emphclass prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized L_1-distance gives a computationally efficient algorithm with an analytic solution, and establish its uniform deviation bound and estimation error bound. Finally, we experimentally demonstrate the usefulness of the proposed method. ER -
APA
Christoffel, M., Niu, G. & Sugiyama, M.. (2016). Class-prior Estimation for Learning from Positive and Unlabeled Data. Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 45:221-236 Available from https://proceedings.mlr.press/v45/Christoffel15.html.

Related Material