Online Clustering of Processes


Azadeh Khaleghi, Daniil Ryabko, Jeremie Mary, Philippe Preux ;
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR 22:601-609, 2012.


The problem of online clustering is considered in the case where each data point is a sequence generated by a stationary ergodic process. Data arrive in an online fashion so that the sample received at every time-step is either a continuation of some previously received sequence or a new sequence. The dependence between the sequences can be arbitrary. No parametric or independence assumptions are made; the only assumption is that the marginal distribution of each sequence is stationary and ergodic. A novel, computationally efficient algorithm is proposed and is shown to be asymptotically consistent (under a natural notion of consistency). The performance of the proposed algorithm is evaluated on simulated data, as well as on real datasets (motion classification).

Related Material