Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing

Yukun Chen; Subramani Mani

Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing

Yukun Chen, Subramani Mani

Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, PMLR 16:113-126, 2011.

Abstract

The common uncertain sampling approach searches for the most uncertain samples closest to the decision boundary for a classification task. However, we might fail to find the uncertain samples when we have a poor probabilistic model. In this work, we develop an active learning strategy called “Uncertainty Sampling with Biasing Consensus” (USBC) which predicts the unbalanced data by multi-model committee and ranks the informativeness of samples by uncertainty sampling with higher weight on the minority class. For prediction, we use Random Forests based multiple models that generate the consensus posterior probability for each sample as part of USBC. To further improve the initial performance in active learning, we also use a semi-supervised learning model that self labels predicted negative samples without querying. For more stable initial performance, we use a filter to avoid querying samples with high variance. We also introduce batch size validation to find the optimal initial batch size for querying samples in active learning.

Cite this Paper

BibTeX


@InProceedings{pmlr-v16-chen11a,
  title = 	 {Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing},
  author = 	 {Chen, Yukun and Mani, Subramani},
  booktitle = 	 {Active Learning and Experimental Design workshop In conjunction with AISTATS 2010},
  pages = 	 {113--126},
  year = 	 {2011},
  editor = 	 {Guyon, Isabelle and Cawley, Gavin and Dror, Gideon and Lemaire, Vincent and Statnikov, Alexander},
  volume = 	 {16},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Sardinia, Italy},
  month = 	 {16 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v16/chen11a/chen11a.pdf},
  url = 	 {https://proceedings.mlr.press/v16/chen11a.html},
  abstract = 	 {The common uncertain sampling approach searches for the most uncertain samples closest to the decision boundary for a classification task. However, we might fail to find the uncertain samples when we have a poor probabilistic model. In this work, we develop an active learning strategy called “Uncertainty Sampling with Biasing Consensus” (USBC) which predicts the unbalanced data by multi-model committee and ranks the informativeness of samples by uncertainty sampling with higher weight on the minority class. For prediction, we use Random Forests based multiple models that generate the consensus posterior probability for each sample as part of USBC. To further improve the initial performance in active learning, we also use a semi-supervised learning model that self labels predicted negative samples without querying. For more stable initial performance, we use a filter to avoid querying samples with high variance. We also introduce batch size validation to find the optimal initial batch size for querying samples in active learning.}
}

Endnote

%0 Conference Paper
%T Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing
%A Yukun Chen
%A Subramani Mani
%B Active Learning and Experimental Design workshop In conjunction with AISTATS 2010
%C Proceedings of Machine Learning Research
%D 2011
%E Isabelle Guyon
%E Gavin Cawley
%E Gideon Dror
%E Vincent Lemaire
%E Alexander Statnikov	
%F pmlr-v16-chen11a
%I PMLR
%P 113--126
%U https://proceedings.mlr.press/v16/chen11a.html
%V 16
%X The common uncertain sampling approach searches for the most uncertain samples closest to the decision boundary for a classification task. However, we might fail to find the uncertain samples when we have a poor probabilistic model. In this work, we develop an active learning strategy called “Uncertainty Sampling with Biasing Consensus” (USBC) which predicts the unbalanced data by multi-model committee and ranks the informativeness of samples by uncertainty sampling with higher weight on the minority class. For prediction, we use Random Forests based multiple models that generate the consensus posterior probability for each sample as part of USBC. To further improve the initial performance in active learning, we also use a semi-supervised learning model that self labels predicted negative samples without querying. For more stable initial performance, we use a filter to avoid querying samples with high variance. We also introduce batch size validation to find the optimal initial batch size for querying samples in active learning.

RIS


TY  - CPAPER
TI  - Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing
AU  - Yukun Chen
AU  - Subramani Mani
BT  - Active Learning and Experimental Design workshop In conjunction with AISTATS 2010
DA  - 2011/04/21
ED  - Isabelle Guyon
ED  - Gavin Cawley
ED  - Gideon Dror
ED  - Vincent Lemaire
ED  - Alexander Statnikov	
ID  - pmlr-v16-chen11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 16
SP  - 113
EP  - 126
L1  - http://proceedings.mlr.press/v16/chen11a/chen11a.pdf
UR  - https://proceedings.mlr.press/v16/chen11a.html
AB  - The common uncertain sampling approach searches for the most uncertain samples closest to the decision boundary for a classification task. However, we might fail to find the uncertain samples when we have a poor probabilistic model. In this work, we develop an active learning strategy called “Uncertainty Sampling with Biasing Consensus” (USBC) which predicts the unbalanced data by multi-model committee and ranks the informativeness of samples by uncertainty sampling with higher weight on the minority class. For prediction, we use Random Forests based multiple models that generate the consensus posterior probability for each sample as part of USBC. To further improve the initial performance in active learning, we also use a semi-supervised learning model that self labels predicted negative samples without querying. For more stable initial performance, we use a filter to avoid querying samples with high variance. We also introduce batch size validation to find the optimal initial batch size for querying samples in active learning.
ER  -

APA


Chen, Y. & Mani, S.. (2011). Active Learning for Unbalanced Data in the Challenge with Multiple Models and Biasing. Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, in Proceedings of Machine Learning Research 16:113-126 Available from https://proceedings.mlr.press/v16/chen11a.html.

Related Material

Download PDF