An Active Learning Algorithm Based on Parzen Window Classiffication

Liang Lan, Haidong Shi, Zhuang Wang, Slobodan Vucetic
; Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, JMLR Workshop and Conference Proceedings 16:99-112, 2011.

Abstract

This paper describes active learning algorithm used in AISTATS 2010 Active Learning Challenge as well as several of its extensions evaluated in the post-competition experiments. The algorithm consists of a pair of Regularized Parzen Window Classifiers, one trained on full set of features and another on features filtered using Pearson correlation. Predictions of the two classifiers are averaged to obtain the ensemble classifier. Parzen Window classifier was chosen because is an easy to implement lazy algorithm and has a single parameter, the kernel window size, that is determined by the cross-validation. The labeling schedule started by selecting random 20 examples and then continued by doubling the number of labeled examples in each round of active learning. A combination of random sampling and uncertainty sampling was used for querying. For the random sampling, examples were first clustered using either all features or the filtered features (whichever resulted in higher cross-validated accuracy) and then the same number of random examples was selected from each cluster. Our algorithm ranked as the 5th overall, and was consistently ranked in the upper half of the competing algorithms. The challenge results show that Parzen Window classifiers are less accurate than several competing learning algorithms used in the competition, but also indicate the success of the simple querying strategy that was employed. In the post-competition, we were able to improve the accuracy by using an ensemble of 5 Parzen Window classifiers, each trained on features selected by different filters. We also explored how more involved querying during the initial stages of active learning and the pre-clustering querying strategy would influence the performance of the proposed algorithm.

Cite this Paper


BibTeX
@InProceedings{pmlr-v16-lan11a, title = {An Active Learning Algorithm Based on Parzen Window Classiffication}, author = {Liang Lan and Haidong Shi and Zhuang Wang and Slobodan Vucetic}, booktitle = {Active Learning and Experimental Design workshop In conjunction with AISTATS 2010}, pages = {99--112}, year = {2011}, editor = {Isabelle Guyon and Gavin Cawley and Gideon Dror and Vincent Lemaire and Alexander Statnikov}, volume = {16}, series = {Proceedings of Machine Learning Research}, address = {Sardinia, Italy}, month = {16 May}, publisher = {JMLR Workshop and Conference Proceedings}, pdf = {http://proceedings.mlr.press/v16/lan11a/lan11a.pdf}, url = {http://proceedings.mlr.press/v16/lan11a.html}, abstract = {This paper describes active learning algorithm used in AISTATS 2010 Active Learning Challenge as well as several of its extensions evaluated in the post-competition experiments. The algorithm consists of a pair of Regularized Parzen Window Classifiers, one trained on full set of features and another on features filtered using Pearson correlation. Predictions of the two classifiers are averaged to obtain the ensemble classifier. Parzen Window classifier was chosen because is an easy to implement lazy algorithm and has a single parameter, the kernel window size, that is determined by the cross-validation. The labeling schedule started by selecting random 20 examples and then continued by doubling the number of labeled examples in each round of active learning. A combination of random sampling and uncertainty sampling was used for querying. For the random sampling, examples were first clustered using either all features or the filtered features (whichever resulted in higher cross-validated accuracy) and then the same number of random examples was selected from each cluster. Our algorithm ranked as the 5th overall, and was consistently ranked in the upper half of the competing algorithms. The challenge results show that Parzen Window classifiers are less accurate than several competing learning algorithms used in the competition, but also indicate the success of the simple querying strategy that was employed. In the post-competition, we were able to improve the accuracy by using an ensemble of 5 Parzen Window classifiers, each trained on features selected by different filters. We also explored how more involved querying during the initial stages of active learning and the pre-clustering querying strategy would influence the performance of the proposed algorithm.} }
Endnote
%0 Conference Paper %T An Active Learning Algorithm Based on Parzen Window Classiffication %A Liang Lan %A Haidong Shi %A Zhuang Wang %A Slobodan Vucetic %B Active Learning and Experimental Design workshop In conjunction with AISTATS 2010 %C Proceedings of Machine Learning Research %D 2011 %E Isabelle Guyon %E Gavin Cawley %E Gideon Dror %E Vincent Lemaire %E Alexander Statnikov %F pmlr-v16-lan11a %I PMLR %J Proceedings of Machine Learning Research %P 99--112 %U http://proceedings.mlr.press %V 16 %W PMLR %X This paper describes active learning algorithm used in AISTATS 2010 Active Learning Challenge as well as several of its extensions evaluated in the post-competition experiments. The algorithm consists of a pair of Regularized Parzen Window Classifiers, one trained on full set of features and another on features filtered using Pearson correlation. Predictions of the two classifiers are averaged to obtain the ensemble classifier. Parzen Window classifier was chosen because is an easy to implement lazy algorithm and has a single parameter, the kernel window size, that is determined by the cross-validation. The labeling schedule started by selecting random 20 examples and then continued by doubling the number of labeled examples in each round of active learning. A combination of random sampling and uncertainty sampling was used for querying. For the random sampling, examples were first clustered using either all features or the filtered features (whichever resulted in higher cross-validated accuracy) and then the same number of random examples was selected from each cluster. Our algorithm ranked as the 5th overall, and was consistently ranked in the upper half of the competing algorithms. The challenge results show that Parzen Window classifiers are less accurate than several competing learning algorithms used in the competition, but also indicate the success of the simple querying strategy that was employed. In the post-competition, we were able to improve the accuracy by using an ensemble of 5 Parzen Window classifiers, each trained on features selected by different filters. We also explored how more involved querying during the initial stages of active learning and the pre-clustering querying strategy would influence the performance of the proposed algorithm.
RIS
TY - CPAPER TI - An Active Learning Algorithm Based on Parzen Window Classiffication AU - Liang Lan AU - Haidong Shi AU - Zhuang Wang AU - Slobodan Vucetic BT - Active Learning and Experimental Design workshop In conjunction with AISTATS 2010 PY - 2011/04/21 DA - 2011/04/21 ED - Isabelle Guyon ED - Gavin Cawley ED - Gideon Dror ED - Vincent Lemaire ED - Alexander Statnikov ID - pmlr-v16-lan11a PB - PMLR SP - 99 DP - PMLR EP - 112 L1 - http://proceedings.mlr.press/v16/lan11a/lan11a.pdf UR - http://proceedings.mlr.press/v16/lan11a.html AB - This paper describes active learning algorithm used in AISTATS 2010 Active Learning Challenge as well as several of its extensions evaluated in the post-competition experiments. The algorithm consists of a pair of Regularized Parzen Window Classifiers, one trained on full set of features and another on features filtered using Pearson correlation. Predictions of the two classifiers are averaged to obtain the ensemble classifier. Parzen Window classifier was chosen because is an easy to implement lazy algorithm and has a single parameter, the kernel window size, that is determined by the cross-validation. The labeling schedule started by selecting random 20 examples and then continued by doubling the number of labeled examples in each round of active learning. A combination of random sampling and uncertainty sampling was used for querying. For the random sampling, examples were first clustered using either all features or the filtered features (whichever resulted in higher cross-validated accuracy) and then the same number of random examples was selected from each cluster. Our algorithm ranked as the 5th overall, and was consistently ranked in the upper half of the competing algorithms. The challenge results show that Parzen Window classifiers are less accurate than several competing learning algorithms used in the competition, but also indicate the success of the simple querying strategy that was employed. In the post-competition, we were able to improve the accuracy by using an ensemble of 5 Parzen Window classifiers, each trained on features selected by different filters. We also explored how more involved querying during the initial stages of active learning and the pre-clustering querying strategy would influence the performance of the proposed algorithm. ER -
APA
Lan, L., Shi, H., Wang, Z. & Vucetic, S.. (2011). An Active Learning Algorithm Based on Parzen Window Classiffication. Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, in PMLR 16:99-112

Related Material