Stochastic Semi-supervised Learning on Partially Labeled Imbalanced Data

Jianjun Xie, Tao Xiong
Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, PMLR 16:85-98, 2011.

Abstract

In this paper, we describe the stochastic semi-supervised learning approach that we used in our submission to all six tasks in 2009-2010 Active Learning Challenge. The method is designed to tackle the binary classification problem under the condition that the number of labeled data points is extremely small and the two classes are highly imbalanced. It starts with only one positive seed given by the contest organizer. We randomly pick additional unlabeled data points and treat them as “negative” seeds based on the fact that the positive label is rare across all datasets. A classifier is trained using the “labeled” data points and then is used to predict the unlabeled dataset. We take the final result to be the average of n stochastic iterations. Supervised learning was used as a large number of labels were purchased. Our approach is shown to work well in 5 out of 6 datasets. The overall results ranked 3rd in the contest.

Cite this Paper


BibTeX
@InProceedings{pmlr-v16-xie11a, title = {Stochastic Semi-supervised Learning on Partially Labeled Imbalanced Data}, author = {Xie, Jianjun and Xiong, Tao}, booktitle = {Active Learning and Experimental Design workshop In conjunction with AISTATS 2010}, pages = {85--98}, year = {2011}, editor = {Guyon, Isabelle and Cawley, Gavin and Dror, Gideon and Lemaire, Vincent and Statnikov, Alexander}, volume = {16}, series = {Proceedings of Machine Learning Research}, address = {Sardinia, Italy}, month = {16 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v16/xie11a/xie11a.pdf}, url = {https://proceedings.mlr.press/v16/xie11a.html}, abstract = {In this paper, we describe the stochastic semi-supervised learning approach that we used in our submission to all six tasks in 2009-2010 Active Learning Challenge. The method is designed to tackle the binary classification problem under the condition that the number of labeled data points is extremely small and the two classes are highly imbalanced. It starts with only one positive seed given by the contest organizer. We randomly pick additional unlabeled data points and treat them as “negative” seeds based on the fact that the positive label is rare across all datasets. A classifier is trained using the “labeled” data points and then is used to predict the unlabeled dataset. We take the final result to be the average of n stochastic iterations. Supervised learning was used as a large number of labels were purchased. Our approach is shown to work well in 5 out of 6 datasets. The overall results ranked 3rd in the contest.} }
Endnote
%0 Conference Paper %T Stochastic Semi-supervised Learning on Partially Labeled Imbalanced Data %A Jianjun Xie %A Tao Xiong %B Active Learning and Experimental Design workshop In conjunction with AISTATS 2010 %C Proceedings of Machine Learning Research %D 2011 %E Isabelle Guyon %E Gavin Cawley %E Gideon Dror %E Vincent Lemaire %E Alexander Statnikov %F pmlr-v16-xie11a %I PMLR %P 85--98 %U https://proceedings.mlr.press/v16/xie11a.html %V 16 %X In this paper, we describe the stochastic semi-supervised learning approach that we used in our submission to all six tasks in 2009-2010 Active Learning Challenge. The method is designed to tackle the binary classification problem under the condition that the number of labeled data points is extremely small and the two classes are highly imbalanced. It starts with only one positive seed given by the contest organizer. We randomly pick additional unlabeled data points and treat them as “negative” seeds based on the fact that the positive label is rare across all datasets. A classifier is trained using the “labeled” data points and then is used to predict the unlabeled dataset. We take the final result to be the average of n stochastic iterations. Supervised learning was used as a large number of labels were purchased. Our approach is shown to work well in 5 out of 6 datasets. The overall results ranked 3rd in the contest.
RIS
TY - CPAPER TI - Stochastic Semi-supervised Learning on Partially Labeled Imbalanced Data AU - Jianjun Xie AU - Tao Xiong BT - Active Learning and Experimental Design workshop In conjunction with AISTATS 2010 DA - 2011/04/21 ED - Isabelle Guyon ED - Gavin Cawley ED - Gideon Dror ED - Vincent Lemaire ED - Alexander Statnikov ID - pmlr-v16-xie11a PB - PMLR DP - Proceedings of Machine Learning Research VL - 16 SP - 85 EP - 98 L1 - http://proceedings.mlr.press/v16/xie11a/xie11a.pdf UR - https://proceedings.mlr.press/v16/xie11a.html AB - In this paper, we describe the stochastic semi-supervised learning approach that we used in our submission to all six tasks in 2009-2010 Active Learning Challenge. The method is designed to tackle the binary classification problem under the condition that the number of labeled data points is extremely small and the two classes are highly imbalanced. It starts with only one positive seed given by the contest organizer. We randomly pick additional unlabeled data points and treat them as “negative” seeds based on the fact that the positive label is rare across all datasets. A classifier is trained using the “labeled” data points and then is used to predict the unlabeled dataset. We take the final result to be the average of n stochastic iterations. Supervised learning was used as a large number of labels were purchased. Our approach is shown to work well in 5 out of 6 datasets. The overall results ranked 3rd in the contest. ER -
APA
Xie, J. & Xiong, T.. (2011). Stochastic Semi-supervised Learning on Partially Labeled Imbalanced Data. Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, in Proceedings of Machine Learning Research 16:85-98 Available from https://proceedings.mlr.press/v16/xie11a.html.

Related Material