Center-Based Approximation of a Drifting Distribution

Alessio Mazzetto, Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal
Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:814-845, 2025.

Abstract

We present a novel technique for computing a center-based approximation of a drifting distribution. Given k1 and a stream of data, whose distribution is changing over time, the goal is to compute, at each step, the best k centers representation of the current distribution, despite possibly having only a single sample from the most recent distribution. In data mining, this is traditionally attempted through the sliding-window mechanism, where the analysis is performed on the most recent fixed-size segment of the data. The problems with this approach are twofold: (1) setting the correct window size is challenging; and (2) a fixed window size cannot effectively track changes in the distribution happening at variable speed. In this paper, we propose a new methodology that dynamically adjusts the window size based on the recent drift of the data. The challenge is that it is not possible to explicitly estimate the drift, as we may have only a single data point from each distribution. Our main contribution lies in providing a rigorous mathematical analysis, establishing both an upper bound via a dynamic window size algorithm, and a lower bound that shows the tightness of our approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v272-mazzetto25a, title = {Center-Based Approximation of a Drifting Distribution}, author = {Mazzetto, Alessio and Ceccarello, Matteo and Pietracaprina, Andrea and Pucci, Geppino and Upfal, Eli}, booktitle = {Proceedings of The 36th International Conference on Algorithmic Learning Theory}, pages = {814--845}, year = {2025}, editor = {Kamath, Gautam and Loh, Po-Ling}, volume = {272}, series = {Proceedings of Machine Learning Research}, month = {24--27 Feb}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v272/main/assets/mazzetto25a/mazzetto25a.pdf}, url = {https://proceedings.mlr.press/v272/mazzetto25a.html}, abstract = {We present a novel technique for computing a center-based approximation of a drifting distribution. Given $k \geq 1$ and a stream of data, whose distribution is changing over time, the goal is to compute, at each step, the best $k$ centers representation of the current distribution, despite possibly having only a single sample from the most recent distribution. In data mining, this is traditionally attempted through the sliding-window mechanism, where the analysis is performed on the most recent fixed-size segment of the data. The problems with this approach are twofold: (1) setting the correct window size is challenging; and (2) a fixed window size cannot effectively track changes in the distribution happening at variable speed. In this paper, we propose a new methodology that dynamically adjusts the window size based on the recent drift of the data. The challenge is that it is not possible to explicitly estimate the drift, as we may have only a single data point from each distribution. Our main contribution lies in providing a rigorous mathematical analysis, establishing both an upper bound via a dynamic window size algorithm, and a lower bound that shows the tightness of our approach.} }
Endnote
%0 Conference Paper %T Center-Based Approximation of a Drifting Distribution %A Alessio Mazzetto %A Matteo Ceccarello %A Andrea Pietracaprina %A Geppino Pucci %A Eli Upfal %B Proceedings of The 36th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Gautam Kamath %E Po-Ling Loh %F pmlr-v272-mazzetto25a %I PMLR %P 814--845 %U https://proceedings.mlr.press/v272/mazzetto25a.html %V 272 %X We present a novel technique for computing a center-based approximation of a drifting distribution. Given $k \geq 1$ and a stream of data, whose distribution is changing over time, the goal is to compute, at each step, the best $k$ centers representation of the current distribution, despite possibly having only a single sample from the most recent distribution. In data mining, this is traditionally attempted through the sliding-window mechanism, where the analysis is performed on the most recent fixed-size segment of the data. The problems with this approach are twofold: (1) setting the correct window size is challenging; and (2) a fixed window size cannot effectively track changes in the distribution happening at variable speed. In this paper, we propose a new methodology that dynamically adjusts the window size based on the recent drift of the data. The challenge is that it is not possible to explicitly estimate the drift, as we may have only a single data point from each distribution. Our main contribution lies in providing a rigorous mathematical analysis, establishing both an upper bound via a dynamic window size algorithm, and a lower bound that shows the tightness of our approach.
APA
Mazzetto, A., Ceccarello, M., Pietracaprina, A., Pucci, G. & Upfal, E.. (2025). Center-Based Approximation of a Drifting Distribution. Proceedings of The 36th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 272:814-845 Available from https://proceedings.mlr.press/v272/mazzetto25a.html.

Related Material