[edit]
Coresets for Data Discretization and Sine Wave Fitting
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:10622-10639, 2022.
Abstract
In the monitoring problem, the input is an unbounded stream P=p1,p2⋯ of integers in [N]:={1,⋯,N}, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the n points received so far in P by a single frequency sin, e.g. min, where cost(P,c)=\sum_{i=1}^n \sin^2(\frac{2\pi}{N} p_ic), C\subseteq [N] is a feasible set of solutions, and \lambda is a given regularization function. For any approximation error \varepsilon>0, we prove that every set P of n integers has a weighted subset S\subseteq P (sometimes called core-set) of cardinality |S|\in O(\log(N)^{O(1)}) that approximates cost(P,c) (for every c\in [N]) up to a multiplicative factor of 1\pm\varepsilon. Using known coreset techniques, this implies streaming algorithms using only O((\log(N)\log(n))^{O(1)}) memory. Our results hold for a large family of functions. Experimental results and open source code are provided.