High-dimensional Time Series Clustering via Cross-Predictability

Dezhi Hong; Quanquan Gu; Kamin Whitehouse

High-dimensional Time Series Clustering via Cross-Predictability

Dezhi Hong, Quanquan Gu, Kamin Whitehouse

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54:642-651, 2017.

Abstract

The key to time series clustering is how to characterize the similarity between any two time series. In this paper, we explore a new similarity metric called “cross-predictability”: the degree to which a future value in each time series is predicted by past values of the others. However, it is challenging to estimate such cross-predictability among time series in the high-dimensional regime, where the number of time series is much larger than the length of each time series. We address this challenge with a sparsity assumption: only time series in the same cluster have significant cross-predictability with each other. We demonstrate that this approach is computationally attractive, and provide a theoretical proof that the proposed algorithm will identify the correct clustering structure with high probability under certain conditions. To the best of our knowledge, this is the first practical high-dimensional time series clustering algorithm with a provable guarantee. We evaluate with experiments on both synthetic data and real-world data, and results indicate that our method can achieve more than 80% clustering accuracy on real-world data, which is 20% higher than the state-of-art baselines.

Cite this Paper

BibTeX


@InProceedings{pmlr-v54-hong17a,
  title = 	 {{High-dimensional Time Series Clustering via Cross-Predictability}},
  author = 	 {Hong, Dezhi and Gu, Quanquan and Whitehouse, Kamin},
  booktitle = 	 {Proceedings of the 20th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {642--651},
  year = 	 {2017},
  editor = 	 {Singh, Aarti and Zhu, Jerry},
  volume = 	 {54},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {20--22 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v54/hong17a/hong17a.pdf},
  url = 	 {https://proceedings.mlr.press/v54/hong17a.html},
  abstract = 	 {The key to time series clustering is how to characterize the similarity between any two time series.  In this paper, we explore a new similarity metric called “cross-predictability”: the degree to which a future value in each time series is predicted by past values of the others.  However, it is challenging to estimate such cross-predictability among time series in the high-dimensional regime, where the number of time series is much larger than the length of each time series. We address this challenge with a sparsity assumption: only time series in the same cluster have significant cross-predictability with each other.  We demonstrate that this approach is computationally attractive, and  provide a theoretical proof that the proposed algorithm will identify the correct clustering structure with high probability under certain conditions. To the best of our knowledge, this is the first practical high-dimensional time series clustering algorithm with a provable guarantee.  We evaluate with experiments on both synthetic data and real-world data, and results indicate that our method can achieve more than 80% clustering accuracy on real-world data, which is 20% higher than the state-of-art baselines.}
}

Endnote

%0 Conference Paper
%T High-dimensional Time Series Clustering via Cross-Predictability
%A Dezhi Hong
%A Quanquan Gu
%A Kamin Whitehouse
%B Proceedings of the 20th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2017
%E Aarti Singh
%E Jerry Zhu	
%F pmlr-v54-hong17a
%I PMLR
%P 642--651
%U https://proceedings.mlr.press/v54/hong17a.html
%V 54
%X The key to time series clustering is how to characterize the similarity between any two time series.  In this paper, we explore a new similarity metric called “cross-predictability”: the degree to which a future value in each time series is predicted by past values of the others.  However, it is challenging to estimate such cross-predictability among time series in the high-dimensional regime, where the number of time series is much larger than the length of each time series. We address this challenge with a sparsity assumption: only time series in the same cluster have significant cross-predictability with each other.  We demonstrate that this approach is computationally attractive, and  provide a theoretical proof that the proposed algorithm will identify the correct clustering structure with high probability under certain conditions. To the best of our knowledge, this is the first practical high-dimensional time series clustering algorithm with a provable guarantee.  We evaluate with experiments on both synthetic data and real-world data, and results indicate that our method can achieve more than 80% clustering accuracy on real-world data, which is 20% higher than the state-of-art baselines.

APA


Hong, D., Gu, Q. & Whitehouse, K.. (2017). High-dimensional Time Series Clustering via Cross-Predictability. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 54:642-651 Available from https://proceedings.mlr.press/v54/hong17a.html.

High-dimensional Time Series Clustering via Cross-Predictability

Abstract

Cite this Paper

Related Material