Convex Formulation for Learning from Positive and Unlabeled Data

Marthinus Du Plessis; Gang Niu; Masashi Sugiyama

Convex Formulation for Learning from Positive and Unlabeled Data

Marthinus Du Plessis, Gang Niu, Masashi Sugiyama

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:1386-1394, 2015.

Abstract

We discuss binary classification from only from positive and unlabeled data (PU classification), which is conceivable in various real-world machine learning problems. Since unlabeled data consists of both positive and negative data, simply separating positive and unlabeled data yields a biased solution. Recently, it was shown that the bias can be canceled by using a particular non-convex loss such as the ramp loss. However, classifier training with a non-convex loss is not straightforward in practice. In this paper, we discuss a convex formulation for PU classification that can still cancel the bias. The key idea is to use different loss functions for positive and unlabeled samples. However, in this setup, the hinge loss is not permissible. As an alternative, we propose the double hinge loss. Theoretically, we prove that the estimators converge to the optimal solutions at the optimal parametric rate. Experimentally, we demonstrate that PU classification with the double hinge loss performs as accurate as the non-convex method, with a much lower computational cost.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-plessis15,
  title = 	 {Convex Formulation for Learning from Positive and Unlabeled Data},
  author = 	 {Plessis, Marthinus Du and Niu, Gang and Sugiyama, Masashi},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {1386--1394},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/plessis15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/plessis15.html},
  abstract = 	 {We discuss binary classification from only from positive and unlabeled data (PU classification), which is conceivable in various real-world machine learning problems. Since unlabeled data consists of both positive and negative data, simply separating positive and unlabeled data yields a biased solution. Recently, it was shown that the bias can be canceled by using a particular non-convex loss such as the ramp loss. However, classifier training with a non-convex loss is not straightforward in practice. In this paper, we discuss a convex formulation for PU classification that can still cancel the bias. The key idea is to use different loss functions for positive and unlabeled samples. However, in this setup, the hinge loss is not permissible. As an alternative, we propose the double hinge loss. Theoretically, we prove that the estimators converge to the optimal solutions at the optimal parametric rate. Experimentally, we demonstrate that PU classification with the double hinge loss performs as accurate as the non-convex method, with a much lower computational cost.}
}

Endnote

%0 Conference Paper
%T Convex Formulation for Learning from Positive and Unlabeled Data
%A Marthinus Du Plessis
%A Gang Niu
%A Masashi Sugiyama
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-plessis15
%I PMLR
%P 1386--1394
%U https://proceedings.mlr.press/v37/plessis15.html
%V 37
%X We discuss binary classification from only from positive and unlabeled data (PU classification), which is conceivable in various real-world machine learning problems. Since unlabeled data consists of both positive and negative data, simply separating positive and unlabeled data yields a biased solution. Recently, it was shown that the bias can be canceled by using a particular non-convex loss such as the ramp loss. However, classifier training with a non-convex loss is not straightforward in practice. In this paper, we discuss a convex formulation for PU classification that can still cancel the bias. The key idea is to use different loss functions for positive and unlabeled samples. However, in this setup, the hinge loss is not permissible. As an alternative, we propose the double hinge loss. Theoretically, we prove that the estimators converge to the optimal solutions at the optimal parametric rate. Experimentally, we demonstrate that PU classification with the double hinge loss performs as accurate as the non-convex method, with a much lower computational cost.

RIS


TY  - CPAPER
TI  - Convex Formulation for Learning from Positive and Unlabeled Data
AU  - Marthinus Du Plessis
AU  - Gang Niu
AU  - Masashi Sugiyama
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-plessis15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 1386
EP  - 1394
L1  - http://proceedings.mlr.press/v37/plessis15.pdf
UR  - https://proceedings.mlr.press/v37/plessis15.html
AB  - We discuss binary classification from only from positive and unlabeled data (PU classification), which is conceivable in various real-world machine learning problems. Since unlabeled data consists of both positive and negative data, simply separating positive and unlabeled data yields a biased solution. Recently, it was shown that the bias can be canceled by using a particular non-convex loss such as the ramp loss. However, classifier training with a non-convex loss is not straightforward in practice. In this paper, we discuss a convex formulation for PU classification that can still cancel the bias. The key idea is to use different loss functions for positive and unlabeled samples. However, in this setup, the hinge loss is not permissible. As an alternative, we propose the double hinge loss. Theoretically, we prove that the estimators converge to the optimal solutions at the optimal parametric rate. Experimentally, we demonstrate that PU classification with the double hinge loss performs as accurate as the non-convex method, with a much lower computational cost.
ER  -

APA


Plessis, M.D., Niu, G. & Sugiyama, M.. (2015). Convex Formulation for Learning from Positive and Unlabeled Data. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:1386-1394 Available from https://proceedings.mlr.press/v37/plessis15.html.

Convex Formulation for Learning from Positive and Unlabeled Data

Abstract

Cite this Paper

Related Material