Learning from Positive and Unlabeled Data under the Selected At Random Assumption

Jessa Bekker, Jesse Davis
Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 94:8-22, 2018.

Abstract

For many interesting tasks, such as medical diagnosis and web page classification, a learner only has access to some positively labeled examples and many unlabeled examples. Learning from this type of data requires making assumptions about the true distribution of the classes and/or the mechanism that was used to select the positive examples to be labeled. The commonly made assumptions, separability of the classes and positive examples being selected completely at random, are very strong. This paper proposes a weaker assumption that assumes the positive examples to be selected at random, conditioned on some of the attributes. To learn under this assumption, an EM method is proposed. Experiments show that our method is not only very capable of learning under this assumption, but it also outperforms the state of the art for learning under the selected completely at random assumption.

Cite this Paper


BibTeX
@InProceedings{pmlr-v94-bekker18a, title = {Learning from Positive and Unlabeled Data under the Selected At Random Assumption}, author = {Bekker, Jessa and Davis, Jesse}, booktitle = {Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications}, pages = {8--22}, year = {2018}, editor = {Torgo, Luís and Matwin, Stan and Japkowicz, Nathalie and Krawczyk, Bartosz and Moniz, Nuno and Branco, Paula}, volume = {94}, series = {Proceedings of Machine Learning Research}, month = {10 Sep}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v94/bekker18a/bekker18a.pdf}, url = {https://proceedings.mlr.press/v94/bekker18a.html}, abstract = {For many interesting tasks, such as medical diagnosis and web page classification, a learner only has access to some positively labeled examples and many unlabeled examples. Learning from this type of data requires making assumptions about the true distribution of the classes and/or the mechanism that was used to select the positive examples to be labeled. The commonly made assumptions, separability of the classes and positive examples being selected completely at random, are very strong. This paper proposes a weaker assumption that assumes the positive examples to be selected at random, conditioned on some of the attributes. To learn under this assumption, an EM method is proposed. Experiments show that our method is not only very capable of learning under this assumption, but it also outperforms the state of the art for learning under the selected completely at random assumption.} }
Endnote
%0 Conference Paper %T Learning from Positive and Unlabeled Data under the Selected At Random Assumption %A Jessa Bekker %A Jesse Davis %B Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications %C Proceedings of Machine Learning Research %D 2018 %E Luís Torgo %E Stan Matwin %E Nathalie Japkowicz %E Bartosz Krawczyk %E Nuno Moniz %E Paula Branco %F pmlr-v94-bekker18a %I PMLR %P 8--22 %U https://proceedings.mlr.press/v94/bekker18a.html %V 94 %X For many interesting tasks, such as medical diagnosis and web page classification, a learner only has access to some positively labeled examples and many unlabeled examples. Learning from this type of data requires making assumptions about the true distribution of the classes and/or the mechanism that was used to select the positive examples to be labeled. The commonly made assumptions, separability of the classes and positive examples being selected completely at random, are very strong. This paper proposes a weaker assumption that assumes the positive examples to be selected at random, conditioned on some of the attributes. To learn under this assumption, an EM method is proposed. Experiments show that our method is not only very capable of learning under this assumption, but it also outperforms the state of the art for learning under the selected completely at random assumption.
APA
Bekker, J. & Davis, J.. (2018). Learning from Positive and Unlabeled Data under the Selected At Random Assumption. Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 94:8-22 Available from https://proceedings.mlr.press/v94/bekker18a.html.

Related Material