On the Need of Class Ratio Insensitive Drift Tests for Data Streams

André Maletzke; Denis Reis; Everton Cherman; Gustavo Batista

On the Need of Class Ratio Insensitive Drift Tests for Data Streams

André Maletzke, Denis Reis, Everton Cherman, Gustavo Batista

Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, PMLR 94:110-124, 2018.

Abstract

Early approaches to detect concept drifts in data streams without actual class labels aim at minimizing external labeling costs. However, their functionality is dubious when presented with changes in the proportion of the classes over time, as such methods keep reporting concept drifts that would not damage the performance of the current classification model. In this paper, we present an approach that can detect changes in the distribution of the features that is insensitive to changes in the distribution of the classes. The method also provides an estimate of the current class ratio and use it to adapt the threshold of a classification model trained with a balanced data. We show that the classification performance achieved by such a modified classifier is greater than that of a classifier trained with the same class distribution as the current imbalanced data.

Cite this Paper

BibTeX


@InProceedings{pmlr-v94-maletzke18a,
  title = 	 {On the Need of Class Ratio Insensitive Drift Tests for Data Streams},
  author =       {Maletzke, Andr\'e and dos Reis, Denis and Cherman, Everton and Batista, Gustavo},
  booktitle = 	 {Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications},
  pages = 	 {110--124},
  year = 	 {2018},
  editor = 	 {Torgo, Luís and Matwin, Stan and Japkowicz, Nathalie and Krawczyk, Bartosz and Moniz, Nuno and Branco, Paula},
  volume = 	 {94},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10 Sep},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v94/maletzke18a/maletzke18a.pdf},
  url = 	 {https://proceedings.mlr.press/v94/maletzke18a.html},
  abstract = 	 {Early approaches to detect concept drifts in data streams without actual class labels aim at minimizing external labeling costs. However, their functionality is dubious when presented with changes in the proportion of the classes over time, as such methods keep reporting concept drifts that would not damage the performance of the current classification model. In this paper, we present an approach that can detect changes in the distribution of the features that is insensitive to changes in the distribution of the classes. The method also provides an estimate of the current class ratio and use it to adapt the threshold of a classification model trained with a balanced data. We show that the classification performance achieved by such a modified classifier is greater than that of a classifier trained with the same class distribution as the current imbalanced data.}
}

Endnote

%0 Conference Paper
%T On the Need of Class Ratio Insensitive Drift Tests for Data Streams
%A André Maletzke
%A Denis Reis
%A Everton Cherman
%A Gustavo Batista
%B Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications
%C Proceedings of Machine Learning Research
%D 2018
%E Luís Torgo
%E Stan Matwin
%E Nathalie Japkowicz
%E Bartosz Krawczyk
%E Nuno Moniz
%E Paula Branco	
%F pmlr-v94-maletzke18a
%I PMLR
%P 110--124
%U https://proceedings.mlr.press/v94/maletzke18a.html
%V 94
%X Early approaches to detect concept drifts in data streams without actual class labels aim at minimizing external labeling costs. However, their functionality is dubious when presented with changes in the proportion of the classes over time, as such methods keep reporting concept drifts that would not damage the performance of the current classification model. In this paper, we present an approach that can detect changes in the distribution of the features that is insensitive to changes in the distribution of the classes. The method also provides an estimate of the current class ratio and use it to adapt the threshold of a classification model trained with a balanced data. We show that the classification performance achieved by such a modified classifier is greater than that of a classifier trained with the same class distribution as the current imbalanced data.

APA


Maletzke, A., Reis, D., Cherman, E. & Batista, G.. (2018). On the Need of Class Ratio Insensitive Drift Tests for Data Streams. Proceedings of the Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, in Proceedings of Machine Learning Research 94:110-124 Available from https://proceedings.mlr.press/v94/maletzke18a.html.

Related Material

Download PDF