High-dimensional Time Series Clustering via Cross-Predictability
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:642-651, 2017.
The key to time series clustering is how to characterize the similarity between any two time series. In this paper, we explore a new similarity metric called “cross-predictability”: the degree to which a future value in each time series is predicted by past values of the others. However, it is challenging to estimate such cross-predictability among time series in the high-dimensional regime, where the number of time series is much larger than the length of each time series. We address this challenge with a sparsity assumption: only time series in the same cluster have significant cross-predictability with each other. We demonstrate that this approach is computationally attractive, and provide a theoretical proof that the proposed algorithm will identify the correct clustering structure with high probability under certain conditions. To the best of our knowledge, this is the first practical high-dimensional time series clustering algorithm with a provable guarantee. We evaluate with experiments on both synthetic data and real-world data, and results indicate that our method can achieve more than 80% clustering accuracy on real-world data, which is 20% higher than the state-of-art baselines.