Inspecting Sample Reusability for Active Learning

Katrin Tomanek, Katherina Morik
Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, PMLR 16:169-181, 2011.

Abstract

Active Learning (AL) exploits a learning algorithm to selectively sample examples which are expected to be highly useful for model learning. The resulting sample is governed by a sampling selection bias. While a bias towards useful examples is desirable, there is also a bias towards the learner applied during AL selection. This paper addresses sample reusability, i.e., the question whether and under which conditions samples selected by AL using one learning algorithm are well-suited as training data for another learning algorithm. Our empirical investigation on general classification problems as well as the natural language processing subtask of Named Entity Recognition shows that many intuitive assumptions on reusability characteristics do not hold. For example, using the same algorithm during AL selection (called selector) and for inducing the final model (called consumer) is not always the optimal choice. We investigate several putatively explanatory factors for sample reusability. One finding is that the suitability of certain selector-consumer pairings cannot be estimated independently of the actual learning problem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v16-tomanek11a, title = {Inspecting Sample Reusability for Active Learning}, author = {Tomanek, Katrin and Morik, Katherina}, booktitle = {Active Learning and Experimental Design workshop In conjunction with AISTATS 2010}, pages = {169--181}, year = {2011}, editor = {Guyon, Isabelle and Cawley, Gavin and Dror, Gideon and Lemaire, Vincent and Statnikov, Alexander}, volume = {16}, series = {Proceedings of Machine Learning Research}, address = {Sardinia, Italy}, month = {16 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v16/tomanek11a/tomanek11a.pdf}, url = {https://proceedings.mlr.press/v16/tomanek11a.html}, abstract = {Active Learning (AL) exploits a learning algorithm to selectively sample examples which are expected to be highly useful for model learning. The resulting sample is governed by a sampling selection bias. While a bias towards useful examples is desirable, there is also a bias towards the learner applied during AL selection. This paper addresses sample reusability, i.e., the question whether and under which conditions samples selected by AL using one learning algorithm are well-suited as training data for another learning algorithm. Our empirical investigation on general classification problems as well as the natural language processing subtask of Named Entity Recognition shows that many intuitive assumptions on reusability characteristics do not hold. For example, using the same algorithm during AL selection (called selector) and for inducing the final model (called consumer) is not always the optimal choice. We investigate several putatively explanatory factors for sample reusability. One finding is that the suitability of certain selector-consumer pairings cannot be estimated independently of the actual learning problem.} }
Endnote
%0 Conference Paper %T Inspecting Sample Reusability for Active Learning %A Katrin Tomanek %A Katherina Morik %B Active Learning and Experimental Design workshop In conjunction with AISTATS 2010 %C Proceedings of Machine Learning Research %D 2011 %E Isabelle Guyon %E Gavin Cawley %E Gideon Dror %E Vincent Lemaire %E Alexander Statnikov %F pmlr-v16-tomanek11a %I PMLR %P 169--181 %U https://proceedings.mlr.press/v16/tomanek11a.html %V 16 %X Active Learning (AL) exploits a learning algorithm to selectively sample examples which are expected to be highly useful for model learning. The resulting sample is governed by a sampling selection bias. While a bias towards useful examples is desirable, there is also a bias towards the learner applied during AL selection. This paper addresses sample reusability, i.e., the question whether and under which conditions samples selected by AL using one learning algorithm are well-suited as training data for another learning algorithm. Our empirical investigation on general classification problems as well as the natural language processing subtask of Named Entity Recognition shows that many intuitive assumptions on reusability characteristics do not hold. For example, using the same algorithm during AL selection (called selector) and for inducing the final model (called consumer) is not always the optimal choice. We investigate several putatively explanatory factors for sample reusability. One finding is that the suitability of certain selector-consumer pairings cannot be estimated independently of the actual learning problem.
RIS
TY - CPAPER TI - Inspecting Sample Reusability for Active Learning AU - Katrin Tomanek AU - Katherina Morik BT - Active Learning and Experimental Design workshop In conjunction with AISTATS 2010 DA - 2011/04/21 ED - Isabelle Guyon ED - Gavin Cawley ED - Gideon Dror ED - Vincent Lemaire ED - Alexander Statnikov ID - pmlr-v16-tomanek11a PB - PMLR DP - Proceedings of Machine Learning Research VL - 16 SP - 169 EP - 181 L1 - http://proceedings.mlr.press/v16/tomanek11a/tomanek11a.pdf UR - https://proceedings.mlr.press/v16/tomanek11a.html AB - Active Learning (AL) exploits a learning algorithm to selectively sample examples which are expected to be highly useful for model learning. The resulting sample is governed by a sampling selection bias. While a bias towards useful examples is desirable, there is also a bias towards the learner applied during AL selection. This paper addresses sample reusability, i.e., the question whether and under which conditions samples selected by AL using one learning algorithm are well-suited as training data for another learning algorithm. Our empirical investigation on general classification problems as well as the natural language processing subtask of Named Entity Recognition shows that many intuitive assumptions on reusability characteristics do not hold. For example, using the same algorithm during AL selection (called selector) and for inducing the final model (called consumer) is not always the optimal choice. We investigate several putatively explanatory factors for sample reusability. One finding is that the suitability of certain selector-consumer pairings cannot be estimated independently of the actual learning problem. ER -
APA
Tomanek, K. & Morik, K.. (2011). Inspecting Sample Reusability for Active Learning. Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, in Proceedings of Machine Learning Research 16:169-181 Available from https://proceedings.mlr.press/v16/tomanek11a.html.

Related Material