High density-focused uncertainty sampling for active learning over evolving stream data

Dino Ienco; Indrė Žliobaitė; Bernhard Pfahringer

High density-focused uncertainty sampling for active learning over evolving stream data

Dino Ienco, Indrė Žliobaitė, Bernhard Pfahringer

Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, PMLR 36:133-148, 2014.

Abstract

Data labeling is an expensive and time-consuming task, hence carefully choosing which labels to use for training a model is becoming increasingly important. In the active learning setting, a classifier is trained by querying labels from a small representative fraction of data. While many approaches exist for non-streaming scenarios, few works consider the challenges of the data stream setting. We propose a new active learning method for evolving data streams based on a combination of density and prediction uncertainty (DBALStream). Our approach decides to label an instance or not, considering whether it lies in an high density partition of the data space. This allows focusing labelling efforts in the instance space where more data is concentrated; hence, the benefits of learning a more accurate classifier are expected to be higher. Instance density is approximated in an online manner by a sliding window mechanism, a standard technique for data streams. We compare our method with state-of-the-art active learning strategies over benchmark datasets. The experimental analysis demonstrates good predictive performance of the new approach.

Cite this Paper

BibTeX


@InProceedings{pmlr-v36-ienco14,
  title = 	 {High density-focused uncertainty sampling for active learning over evolving stream data},
  author = 	 {Ienco, Dino and Žliobaitė, Indrė and Pfahringer, Bernhard},
  booktitle = 	 {Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications},
  pages = 	 {133--148},
  year = 	 {2014},
  editor = 	 {Fan, Wei and Bifet, Albert and Yang, Qiang and Yu, Philip S.},
  volume = 	 {36},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {24 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v36/ienco14.pdf},
  url = 	 {https://proceedings.mlr.press/v36/ienco14.html},
  abstract = 	 {Data labeling is an expensive and time-consuming task, hence carefully choosing which labels to use for training a model is becoming increasingly important. In the active learning setting, a classifier is trained by querying labels from a small representative fraction of data. While many approaches exist for non-streaming scenarios, few works consider the challenges of the data stream setting. We propose a new active learning method for evolving data streams based on a combination of density and prediction uncertainty (DBALStream). Our approach decides to label an instance or not, considering whether it lies in an high density partition of the data space. This allows focusing labelling efforts in the instance space where more data is concentrated; hence, the benefits of learning a more accurate classifier are expected to be higher. Instance density is approximated in an online manner by a sliding window mechanism, a standard technique for data streams. We compare our method with state-of-the-art active learning strategies over benchmark datasets. The experimental analysis demonstrates good predictive performance of the new approach.}
}

Endnote

%0 Conference Paper
%T High density-focused uncertainty sampling for active learning over evolving stream data
%A Dino Ienco
%A Indrė Žliobaitė
%A Bernhard Pfahringer
%B Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
%C Proceedings of Machine Learning Research
%D 2014
%E Wei Fan
%E Albert Bifet
%E Qiang Yang
%E Philip S. Yu	
%F pmlr-v36-ienco14
%I PMLR
%P 133--148
%U https://proceedings.mlr.press/v36/ienco14.html
%V 36
%X Data labeling is an expensive and time-consuming task, hence carefully choosing which labels to use for training a model is becoming increasingly important. In the active learning setting, a classifier is trained by querying labels from a small representative fraction of data. While many approaches exist for non-streaming scenarios, few works consider the challenges of the data stream setting. We propose a new active learning method for evolving data streams based on a combination of density and prediction uncertainty (DBALStream). Our approach decides to label an instance or not, considering whether it lies in an high density partition of the data space. This allows focusing labelling efforts in the instance space where more data is concentrated; hence, the benefits of learning a more accurate classifier are expected to be higher. Instance density is approximated in an online manner by a sliding window mechanism, a standard technique for data streams. We compare our method with state-of-the-art active learning strategies over benchmark datasets. The experimental analysis demonstrates good predictive performance of the new approach.

RIS


TY  - CPAPER
TI  - High density-focused uncertainty sampling for active learning over evolving stream data
AU  - Dino Ienco
AU  - Indrė Žliobaitė
AU  - Bernhard Pfahringer
BT  - Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
DA  - 2014/08/13
ED  - Wei Fan
ED  - Albert Bifet
ED  - Qiang Yang
ED  - Philip S. Yu	
ID  - pmlr-v36-ienco14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 36
SP  - 133
EP  - 148
L1  - http://proceedings.mlr.press/v36/ienco14.pdf
UR  - https://proceedings.mlr.press/v36/ienco14.html
AB  - Data labeling is an expensive and time-consuming task, hence carefully choosing which labels to use for training a model is becoming increasingly important. In the active learning setting, a classifier is trained by querying labels from a small representative fraction of data. While many approaches exist for non-streaming scenarios, few works consider the challenges of the data stream setting. We propose a new active learning method for evolving data streams based on a combination of density and prediction uncertainty (DBALStream). Our approach decides to label an instance or not, considering whether it lies in an high density partition of the data space. This allows focusing labelling efforts in the instance space where more data is concentrated; hence, the benefits of learning a more accurate classifier are expected to be higher. Instance density is approximated in an online manner by a sliding window mechanism, a standard technique for data streams. We compare our method with state-of-the-art active learning strategies over benchmark datasets. The experimental analysis demonstrates good predictive performance of the new approach.
ER  -

APA


Ienco, D., Žliobaitė, I. & Pfahringer, B.. (2014). High density-focused uncertainty sampling for active learning over evolving stream data. Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, in Proceedings of Machine Learning Research 36:133-148 Available from https://proceedings.mlr.press/v36/ienco14.html.

Related Material

Download PDF