QuantTree: Histograms for Change Detection in Multivariate Data Streams

Giacomo Boracchi, Diego Carrera, Cristiano Cervellera, Danilo Macciò
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:639-648, 2018.

Abstract

We address the problem of detecting distribution changes in multivariate data streams by means of histograms. Histograms are very general and flexible models, which have been relatively ignored in the change-detection literature as they often require a number of bins that grows unfeasibly with the data dimension. We present QuantTree, a recursive binary splitting scheme that adaptively defines the histogram bins to ease the detection of any distribution change. Our design scheme implies that i) we can easily control the overall number of bins and ii) the bin probabilities do not depend on the distribution of stationary data. This latter is a very relevant aspect in change detection, since thresholds of tests statistics based on these histograms (e.g., the Pearson statistic or the total variation) can be numerically computed from univariate and synthetically generated data, yet guaranteeing a controlled false positive rate. Our experiments show that the proposed histograms are very effective in detecting changes in high dimensional data streams, and that the resulting thresholds can effectively control the false positive rate, even when the number of training samples is relatively small.

Cite this Paper


BibTeX
@InProceedings{pmlr-v80-boracchi18a, title = {{Q}uant{T}ree: Histograms for Change Detection in Multivariate Data Streams}, author = {Boracchi, Giacomo and Carrera, Diego and Cervellera, Cristiano and Macci{\`o}, Danilo}, booktitle = {Proceedings of the 35th International Conference on Machine Learning}, pages = {639--648}, year = {2018}, editor = {Dy, Jennifer and Krause, Andreas}, volume = {80}, series = {Proceedings of Machine Learning Research}, month = {10--15 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v80/boracchi18a/boracchi18a.pdf}, url = {http://proceedings.mlr.press/v80/boracchi18a.html}, abstract = {We address the problem of detecting distribution changes in multivariate data streams by means of histograms. Histograms are very general and flexible models, which have been relatively ignored in the change-detection literature as they often require a number of bins that grows unfeasibly with the data dimension. We present QuantTree, a recursive binary splitting scheme that adaptively defines the histogram bins to ease the detection of any distribution change. Our design scheme implies that i) we can easily control the overall number of bins and ii) the bin probabilities do not depend on the distribution of stationary data. This latter is a very relevant aspect in change detection, since thresholds of tests statistics based on these histograms (e.g., the Pearson statistic or the total variation) can be numerically computed from univariate and synthetically generated data, yet guaranteeing a controlled false positive rate. Our experiments show that the proposed histograms are very effective in detecting changes in high dimensional data streams, and that the resulting thresholds can effectively control the false positive rate, even when the number of training samples is relatively small.} }
Endnote
%0 Conference Paper %T QuantTree: Histograms for Change Detection in Multivariate Data Streams %A Giacomo Boracchi %A Diego Carrera %A Cristiano Cervellera %A Danilo Macciò %B Proceedings of the 35th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jennifer Dy %E Andreas Krause %F pmlr-v80-boracchi18a %I PMLR %P 639--648 %U http://proceedings.mlr.press/v80/boracchi18a.html %V 80 %X We address the problem of detecting distribution changes in multivariate data streams by means of histograms. Histograms are very general and flexible models, which have been relatively ignored in the change-detection literature as they often require a number of bins that grows unfeasibly with the data dimension. We present QuantTree, a recursive binary splitting scheme that adaptively defines the histogram bins to ease the detection of any distribution change. Our design scheme implies that i) we can easily control the overall number of bins and ii) the bin probabilities do not depend on the distribution of stationary data. This latter is a very relevant aspect in change detection, since thresholds of tests statistics based on these histograms (e.g., the Pearson statistic or the total variation) can be numerically computed from univariate and synthetically generated data, yet guaranteeing a controlled false positive rate. Our experiments show that the proposed histograms are very effective in detecting changes in high dimensional data streams, and that the resulting thresholds can effectively control the false positive rate, even when the number of training samples is relatively small.
APA
Boracchi, G., Carrera, D., Cervellera, C. & Macciò, D.. (2018). QuantTree: Histograms for Change Detection in Multivariate Data Streams. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:639-648 Available from http://proceedings.mlr.press/v80/boracchi18a.html.

Related Material