High-dimensional Time Series Clustering via Cross-Predictability

Dezhi Hong, Quanquan Gu, Kamin Whitehouse
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:642-651, 2017.

Abstract

The key to time series clustering is how to characterize the similarity between any two time series. In this paper, we explore a new similarity metric called “cross-predictability”: the degree to which a future value in each time series is predicted by past values of the others. However, it is challenging to estimate such cross-predictability among time series in the high-dimensional regime, where the number of time series is much larger than the length of each time series. We address this challenge with a sparsity assumption: only time series in the same cluster have significant cross-predictability with each other. We demonstrate that this approach is computationally attractive, and provide a theoretical proof that the proposed algorithm will identify the correct clustering structure with high probability under certain conditions. To the best of our knowledge, this is the first practical high-dimensional time series clustering algorithm with a provable guarantee. We evaluate with experiments on both synthetic data and real-world data, and results indicate that our method can achieve more than 80% clustering accuracy on real-world data, which is 20% higher than the state-of-art baselines.

Cite this Paper


BibTeX
@InProceedings{pmlr-v54-hong17a, title = {{High-dimensional Time Series Clustering via Cross-Predictability}}, author = {Hong, Dezhi and Gu, Quanquan and Whitehouse, Kamin}, booktitle = {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics}, pages = {642--651}, year = {2017}, editor = {Singh, Aarti and Zhu, Jerry}, volume = {54}, series = {Proceedings of Machine Learning Research}, month = {20--22 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v54/hong17a/hong17a.pdf}, url = {https://proceedings.mlr.press/v54/hong17a.html}, abstract = {The key to time series clustering is how to characterize the similarity between any two time series. In this paper, we explore a new similarity metric called “cross-predictability”: the degree to which a future value in each time series is predicted by past values of the others. However, it is challenging to estimate such cross-predictability among time series in the high-dimensional regime, where the number of time series is much larger than the length of each time series. We address this challenge with a sparsity assumption: only time series in the same cluster have significant cross-predictability with each other. We demonstrate that this approach is computationally attractive, and provide a theoretical proof that the proposed algorithm will identify the correct clustering structure with high probability under certain conditions. To the best of our knowledge, this is the first practical high-dimensional time series clustering algorithm with a provable guarantee. We evaluate with experiments on both synthetic data and real-world data, and results indicate that our method can achieve more than 80% clustering accuracy on real-world data, which is 20% higher than the state-of-art baselines.} }
Endnote
%0 Conference Paper %T High-dimensional Time Series Clustering via Cross-Predictability %A Dezhi Hong %A Quanquan Gu %A Kamin Whitehouse %B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2017 %E Aarti Singh %E Jerry Zhu %F pmlr-v54-hong17a %I PMLR %P 642--651 %U https://proceedings.mlr.press/v54/hong17a.html %V 54 %X The key to time series clustering is how to characterize the similarity between any two time series. In this paper, we explore a new similarity metric called “cross-predictability”: the degree to which a future value in each time series is predicted by past values of the others. However, it is challenging to estimate such cross-predictability among time series in the high-dimensional regime, where the number of time series is much larger than the length of each time series. We address this challenge with a sparsity assumption: only time series in the same cluster have significant cross-predictability with each other. We demonstrate that this approach is computationally attractive, and provide a theoretical proof that the proposed algorithm will identify the correct clustering structure with high probability under certain conditions. To the best of our knowledge, this is the first practical high-dimensional time series clustering algorithm with a provable guarantee. We evaluate with experiments on both synthetic data and real-world data, and results indicate that our method can achieve more than 80% clustering accuracy on real-world data, which is 20% higher than the state-of-art baselines.
APA
Hong, D., Gu, Q. & Whitehouse, K.. (2017). High-dimensional Time Series Clustering via Cross-Predictability. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 54:642-651 Available from https://proceedings.mlr.press/v54/hong17a.html.

Related Material