Class-prior Estimation for Learning from Positive and Unlabeled Data

Marthinus Christoffel; Gang Niu; Masashi Sugiyama

Class-prior Estimation for Learning from Positive and Unlabeled Data

Marthinus Christoffel, Gang Niu, Masashi Sugiyama

Asian Conference on Machine Learning, PMLR 45:221-236, 2016.

Abstract

We consider the problem of estimating the \emphclass prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized L_1-distance gives a computationally efficient algorithm with an analytic solution, and establish its uniform deviation bound and estimation error bound. Finally, we experimentally demonstrate the usefulness of the proposed method.

Cite this Paper

BibTeX


@InProceedings{pmlr-v45-Christoffel15,
  title = 	 {Class-prior Estimation for Learning from Positive and Unlabeled Data},
  author = 	 {Christoffel, Marthinus and Niu, Gang and Sugiyama, Masashi},
  booktitle = 	 {Asian Conference on Machine Learning},
  pages = 	 {221--236},
  year = 	 {2016},
  editor = 	 {Holmes, Geoffrey and Liu, Tie-Yan},
  volume = 	 {45},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Hong Kong},
  month = 	 {20--22 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v45/Christoffel15.pdf},
  url = 	 {https://proceedings.mlr.press/v45/Christoffel15.html},
  abstract = 	 {We consider the problem of estimating the \emphclass prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized L_1-distance gives a computationally efficient algorithm with an analytic solution, and establish its uniform deviation bound and estimation error bound. Finally, we experimentally demonstrate the usefulness of the proposed method.}
}

Endnote

%0 Conference Paper
%T Class-prior Estimation for Learning from Positive and Unlabeled Data
%A Marthinus Christoffel
%A Gang Niu
%A Masashi Sugiyama
%B Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Geoffrey Holmes
%E Tie-Yan Liu	
%F pmlr-v45-Christoffel15
%I PMLR
%P 221--236
%U https://proceedings.mlr.press/v45/Christoffel15.html
%V 45
%X We consider the problem of estimating the \emphclass prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized L_1-distance gives a computationally efficient algorithm with an analytic solution, and establish its uniform deviation bound and estimation error bound. Finally, we experimentally demonstrate the usefulness of the proposed method.

RIS


TY  - CPAPER
TI  - Class-prior Estimation for Learning from Positive and Unlabeled Data
AU  - Marthinus Christoffel
AU  - Gang Niu
AU  - Masashi Sugiyama
BT  - Asian Conference on Machine Learning
DA  - 2016/02/25
ED  - Geoffrey Holmes
ED  - Tie-Yan Liu	
ID  - pmlr-v45-Christoffel15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 45
SP  - 221
EP  - 236
L1  - http://proceedings.mlr.press/v45/Christoffel15.pdf
UR  - https://proceedings.mlr.press/v45/Christoffel15.html
AB  - We consider the problem of estimating the \emphclass prior in an unlabeled dataset. Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution. However, in practice, such an additional labeled dataset is often not available. In this paper, we show that, with additional samples coming only from the positive class, the class prior of the unlabeled dataset can be estimated correctly. Our key idea is to use properly penalized divergences for model fitting to cancel the error caused by the absence of negative samples. We further show that the use of the penalized L_1-distance gives a computationally efficient algorithm with an analytic solution, and establish its uniform deviation bound and estimation error bound. Finally, we experimentally demonstrate the usefulness of the proposed method.
ER  -

APA


Christoffel, M., Niu, G. & Sugiyama, M.. (2016). Class-prior Estimation for Learning from Positive and Unlabeled Data. Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 45:221-236 Available from https://proceedings.mlr.press/v45/Christoffel15.html.

Related Material

Download PDF