[edit]
Center-Based Approximation of a Drifting Distribution
Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:814-845, 2025.
Abstract
We present a novel technique for computing a center-based approximation of a drifting distribution. Given k≥1 and a stream of data, whose distribution is changing over time, the goal is to compute, at each step, the best k centers representation of the current distribution, despite possibly having only a single sample from the most recent distribution. In data mining, this is traditionally attempted through the sliding-window mechanism, where the analysis is performed on the most recent fixed-size segment of the data. The problems with this approach are twofold: (1) setting the correct window size is challenging; and (2) a fixed window size cannot effectively track changes in the distribution happening at variable speed. In this paper, we propose a new methodology that dynamically adjusts the window size based on the recent drift of the data. The challenge is that it is not possible to explicitly estimate the drift, as we may have only a single data point from each distribution. Our main contribution lies in providing a rigorous mathematical analysis, establishing both an upper bound via a dynamic window size algorithm, and a lower bound that shows the tightness of our approach.