Mining Recurring Concept Drifts with Limited Labeled Streaming Data

Peipei Li; Xindong Wu; Xuegang Hu

Mining Recurring Concept Drifts with Limited Labeled Streaming Data

Peipei Li, Xindong Wu, Xuegang Hu

Proceedings of 2nd Asian Conference on Machine Learning, PMLR 13:241-252, 2010.

Abstract

Tracking recurring concept drifts is a significant issue for machine learning and data mining that frequently appears in real world stream classification problems. It is a challenge for many streaming classification algorithms to learn recurring concepts in a data stream envi- ronment with unlabeled data, and this challenge has received little attention from the research community. Motivated by this challenge, this paper focuses on the problem of recurring contexts in streaming environments with limited labeled data. We propose a Semisupervised classification algorithm for data streams with REcurring concept Drifts and Limited LAbeled data, called REDLLA, in which, a decision tree is adopted as the classification model. When growing a tree, a clustering algorithm based on k-Means is installed to produce concept clusters and unlabeled data are labeled at leaves. In view of deviations between history and new concept clusters, potential concept drifts are distinguished and recurring concepts are maintained. Extensive studies on both synthetic and real-world data confirm the advantages of our REDLLA algorithm over two state-of-the-art online classification algorithms of CVFDT and CDRDT and several known online semi-supervised algorithms, even in the case with more than 90% unlabeled data.

Cite this Paper

BibTeX

@InProceedings{pmlr-v13-li10a,
  title = 	 {Mining Recurring Concept Drifts with Limited Labeled Streaming Data},
  author = 	 {Li, Peipei and Wu, Xindong and Hu, Xuegang},
  booktitle = 	 {Proceedings of 2nd Asian Conference on Machine Learning},
  pages = 	 {241--252},
  year = 	 {2010},
  editor = 	 {Sugiyama, Masashi and Yang, Qiang},
  volume = 	 {13},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Tokyo, Japan},
  month = 	 {08--10 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v13/li10a/li10a.pdf},
  url = 	 {https://proceedings.mlr.press/v13/li10a.html},
  abstract = 	 {Tracking recurring concept drifts is a significant issue for machine learning and data mining that frequently appears in real world stream classification problems. It is a challenge for many streaming classification algorithms to learn recurring concepts in a data stream envi- ronment with unlabeled data, and this challenge has received little attention from the research community. Motivated by this challenge, this paper focuses on the problem of recurring contexts in streaming environments with limited labeled data. We propose a Semisupervised classification algorithm for data streams with REcurring concept Drifts and Limited LAbeled data, called REDLLA, in which, a decision tree is adopted as the classification model. When growing a tree, a clustering algorithm based on k-Means is installed to produce concept clusters and unlabeled data are labeled at leaves. In view of deviations between history and new concept clusters, potential concept drifts are distinguished and recurring concepts are maintained. Extensive studies on both synthetic and real-world data confirm the advantages of our REDLLA algorithm over two state-of-the-art online classification algorithms of CVFDT and CDRDT and several known online semi-supervised algorithms, even in the case with more than 90% unlabeled data.}
}

Endnote

%0 Conference Paper
%T Mining Recurring Concept Drifts with Limited Labeled Streaming Data
%A Peipei Li
%A Xindong Wu
%A Xuegang Hu
%B Proceedings of 2nd Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2010
%E Masashi Sugiyama
%E Qiang Yang	
%F pmlr-v13-li10a
%I PMLR
%P 241--252
%U https://proceedings.mlr.press/v13/li10a.html
%V 13
%X Tracking recurring concept drifts is a significant issue for machine learning and data mining that frequently appears in real world stream classification problems. It is a challenge for many streaming classification algorithms to learn recurring concepts in a data stream envi- ronment with unlabeled data, and this challenge has received little attention from the research community. Motivated by this challenge, this paper focuses on the problem of recurring contexts in streaming environments with limited labeled data. We propose a Semisupervised classification algorithm for data streams with REcurring concept Drifts and Limited LAbeled data, called REDLLA, in which, a decision tree is adopted as the classification model. When growing a tree, a clustering algorithm based on k-Means is installed to produce concept clusters and unlabeled data are labeled at leaves. In view of deviations between history and new concept clusters, potential concept drifts are distinguished and recurring concepts are maintained. Extensive studies on both synthetic and real-world data confirm the advantages of our REDLLA algorithm over two state-of-the-art online classification algorithms of CVFDT and CDRDT and several known online semi-supervised algorithms, even in the case with more than 90% unlabeled data.

RIS

TY  - CPAPER
TI  - Mining Recurring Concept Drifts with Limited Labeled Streaming Data
AU  - Peipei Li
AU  - Xindong Wu
AU  - Xuegang Hu
BT  - Proceedings of 2nd Asian Conference on Machine Learning
DA  - 2010/10/31
ED  - Masashi Sugiyama
ED  - Qiang Yang	
ID  - pmlr-v13-li10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 13
SP  - 241
EP  - 252
L1  - http://proceedings.mlr.press/v13/li10a/li10a.pdf
UR  - https://proceedings.mlr.press/v13/li10a.html
AB  - Tracking recurring concept drifts is a significant issue for machine learning and data mining that frequently appears in real world stream classification problems. It is a challenge for many streaming classification algorithms to learn recurring concepts in a data stream envi- ronment with unlabeled data, and this challenge has received little attention from the research community. Motivated by this challenge, this paper focuses on the problem of recurring contexts in streaming environments with limited labeled data. We propose a Semisupervised classification algorithm for data streams with REcurring concept Drifts and Limited LAbeled data, called REDLLA, in which, a decision tree is adopted as the classification model. When growing a tree, a clustering algorithm based on k-Means is installed to produce concept clusters and unlabeled data are labeled at leaves. In view of deviations between history and new concept clusters, potential concept drifts are distinguished and recurring concepts are maintained. Extensive studies on both synthetic and real-world data confirm the advantages of our REDLLA algorithm over two state-of-the-art online classification algorithms of CVFDT and CDRDT and several known online semi-supervised algorithms, even in the case with more than 90% unlabeled data.
ER  -

APA

Li, P., Wu, X. & Hu, X.. (2010). Mining Recurring Concept Drifts with Limited Labeled Streaming Data. Proceedings of 2nd Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 13:241-252 Available from https://proceedings.mlr.press/v13/li10a.html.

Related Material

Download PDF