Classification from Pairwise Similarity and Unlabeled Data

Han Bao, Gang Niu, Masashi Sugiyama
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:452-461, 2018.

Abstract

Supervised learning needs a huge amount of labeled data, which can be a big bottleneck under the situation where there is a privacy concern or labeling cost is high. To overcome this problem, we propose a new weakly-supervised learning setting where only similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points are needed instead of fully labeled data, which is called SU classification. We show that an unbiased estimator of the classification risk can be obtained only from SU data, and the estimation error of its empirical risk minimizer achieves the optimal parametric convergence rate. Finally, we demonstrate the effectiveness of the proposed method through experiments.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-bao18a, title = {Classification from Pairwise Similarity and Unlabeled Data}, author = {Bao, Han and Niu, Gang and Sugiyama, Masashi}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {452--461}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/bao18a/bao18a.pdf}, url = {http://proceedings.mlr.press/v80/bao18a.html}, abstract = {Supervised learning needs a huge amount of labeled data, which can be a big bottleneck under the situation where there is a privacy concern or labeling cost is high. To overcome this problem, we propose a new weakly-supervised learning setting where only similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points are needed instead of fully labeled data, which is called SU classification. We show that an unbiased estimator of the classification risk can be obtained only from SU data, and the estimation error of its empirical risk minimizer achieves the optimal parametric convergence rate. Finally, we demonstrate the effectiveness of the proposed method through experiments.} }
Endnote
%0 Conference Paper %T Classification from Pairwise Similarity and Unlabeled Data %A Han Bao %A Gang Niu %A Masashi Sugiyama %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-bao18a %I PMLR %P 452--461 %U http://proceedings.mlr.press/v80/bao18a.html %V 80 %X Supervised learning needs a huge amount of labeled data, which can be a big bottleneck under the situation where there is a privacy concern or labeling cost is high. To overcome this problem, we propose a new weakly-supervised learning setting where only similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points are needed instead of fully labeled data, which is called SU classification. We show that an unbiased estimator of the classification risk can be obtained only from SU data, and the estimation error of its empirical risk minimizer achieves the optimal parametric convergence rate. Finally, we demonstrate the effectiveness of the proposed method through experiments.
APA
Bao, H., Niu, G. & Sugiyama, M.. (2018). Classification from Pairwise Similarity and Unlabeled Data. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:452-461 Available from http://proceedings.mlr.press/v80/bao18a.html.

Related Material