Inspecting Sample Reusability for Active Learning

Katrin Tomanek; Katherina Morik

Inspecting Sample Reusability for Active Learning

Katrin Tomanek, Katherina Morik

Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, PMLR 16:169-181, 2011.

Abstract

Active Learning (AL) exploits a learning algorithm to selectively sample examples which are expected to be highly useful for model learning. The resulting sample is governed by a sampling selection bias. While a bias towards useful examples is desirable, there is also a bias towards the learner applied during AL selection. This paper addresses sample reusability, i.e., the question whether and under which conditions samples selected by AL using one learning algorithm are well-suited as training data for another learning algorithm. Our empirical investigation on general classification problems as well as the natural language processing subtask of Named Entity Recognition shows that many intuitive assumptions on reusability characteristics do not hold. For example, using the same algorithm during AL selection (called selector) and for inducing the final model (called consumer) is not always the optimal choice. We investigate several putatively explanatory factors for sample reusability. One finding is that the suitability of certain selector-consumer pairings cannot be estimated independently of the actual learning problem.

Cite this Paper

BibTeX


@InProceedings{pmlr-v16-tomanek11a,
  title = 	 {Inspecting Sample Reusability for Active Learning},
  author = 	 {Tomanek, Katrin and Morik, Katherina},
  booktitle = 	 {Active Learning and Experimental Design workshop In conjunction with AISTATS 2010},
  pages = 	 {169--181},
  year = 	 {2011},
  editor = 	 {Guyon, Isabelle and Cawley, Gavin and Dror, Gideon and Lemaire, Vincent and Statnikov, Alexander},
  volume = 	 {16},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Sardinia, Italy},
  month = 	 {16 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v16/tomanek11a/tomanek11a.pdf},
  url = 	 {https://proceedings.mlr.press/v16/tomanek11a.html},
  abstract = 	 {Active Learning (AL) exploits a learning algorithm to selectively sample examples which are expected to be highly useful for model learning. The resulting sample is governed by a sampling selection bias. While a bias towards useful examples is desirable, there is also a bias towards the learner applied during AL selection. This paper addresses sample reusability, i.e., the question whether and under which conditions samples selected by AL using one learning algorithm are well-suited as training data for another learning algorithm.    Our empirical investigation on general classification problems as well as the natural language processing subtask of Named Entity Recognition shows that many intuitive assumptions on reusability characteristics do not hold. For example, using the same algorithm during AL selection (called selector) and for inducing the final model (called consumer) is not always the optimal choice. We investigate several putatively explanatory factors for sample reusability. One finding is that the suitability of certain selector-consumer pairings cannot be estimated independently of the actual learning problem.}
}

Endnote

%0 Conference Paper
%T Inspecting Sample Reusability for Active Learning
%A Katrin Tomanek
%A Katherina Morik
%B Active Learning and Experimental Design workshop In conjunction with AISTATS 2010
%C Proceedings of Machine Learning Research
%D 2011
%E Isabelle Guyon
%E Gavin Cawley
%E Gideon Dror
%E Vincent Lemaire
%E Alexander Statnikov	
%F pmlr-v16-tomanek11a
%I PMLR
%P 169--181
%U https://proceedings.mlr.press/v16/tomanek11a.html
%V 16
%X Active Learning (AL) exploits a learning algorithm to selectively sample examples which are expected to be highly useful for model learning. The resulting sample is governed by a sampling selection bias. While a bias towards useful examples is desirable, there is also a bias towards the learner applied during AL selection. This paper addresses sample reusability, i.e., the question whether and under which conditions samples selected by AL using one learning algorithm are well-suited as training data for another learning algorithm.    Our empirical investigation on general classification problems as well as the natural language processing subtask of Named Entity Recognition shows that many intuitive assumptions on reusability characteristics do not hold. For example, using the same algorithm during AL selection (called selector) and for inducing the final model (called consumer) is not always the optimal choice. We investigate several putatively explanatory factors for sample reusability. One finding is that the suitability of certain selector-consumer pairings cannot be estimated independently of the actual learning problem.

RIS


TY  - CPAPER
TI  - Inspecting Sample Reusability for Active Learning
AU  - Katrin Tomanek
AU  - Katherina Morik
BT  - Active Learning and Experimental Design workshop In conjunction with AISTATS 2010
DA  - 2011/04/21
ED  - Isabelle Guyon
ED  - Gavin Cawley
ED  - Gideon Dror
ED  - Vincent Lemaire
ED  - Alexander Statnikov	
ID  - pmlr-v16-tomanek11a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 16
SP  - 169
EP  - 181
L1  - http://proceedings.mlr.press/v16/tomanek11a/tomanek11a.pdf
UR  - https://proceedings.mlr.press/v16/tomanek11a.html
AB  - Active Learning (AL) exploits a learning algorithm to selectively sample examples which are expected to be highly useful for model learning. The resulting sample is governed by a sampling selection bias. While a bias towards useful examples is desirable, there is also a bias towards the learner applied during AL selection. This paper addresses sample reusability, i.e., the question whether and under which conditions samples selected by AL using one learning algorithm are well-suited as training data for another learning algorithm.    Our empirical investigation on general classification problems as well as the natural language processing subtask of Named Entity Recognition shows that many intuitive assumptions on reusability characteristics do not hold. For example, using the same algorithm during AL selection (called selector) and for inducing the final model (called consumer) is not always the optimal choice. We investigate several putatively explanatory factors for sample reusability. One finding is that the suitability of certain selector-consumer pairings cannot be estimated independently of the actual learning problem.
ER  -

APA


Tomanek, K. & Morik, K.. (2011). Inspecting Sample Reusability for Active Learning. Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, in Proceedings of Machine Learning Research 16:169-181 Available from https://proceedings.mlr.press/v16/tomanek11a.html.

Related Material

Download PDF