SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series

Chen Luo; Anshumali Shrivastava

SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series

Chen Luo, Anshumali Shrivastava

Proceedings of the Time Series Workshop at NIPS 2016, PMLR 55:38-58, 2017.

Abstract

Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.

Cite this Paper

BibTeX


@InProceedings{pmlr-v55-luo16,
  title = 	 {{SSH} (Sketch, Shingle, \& Hash) for Indexing Massive-Scale Time Series},
  author = 	 {Luo, Chen and Shrivastava, Anshumali},
  booktitle = 	 {Proceedings of the Time Series Workshop at NIPS 2016},
  pages = 	 {38--58},
  year = 	 {2017},
  editor = 	 {Anava, Oren and Khaleghi, Azadeh and Cuturi, Marco and Kuznetsov, Vitaly and Rakhlin, Alexander},
  volume = 	 {55},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Barcelona, Spain},
  month = 	 {09 Dec},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v55/luo16.pdf},
  url = 	 {https://proceedings.mlr.press/v55/luo16.html},
  abstract = 	 {Similarity search on time series is a frequent operation in large-scale data-driven applications.  Sophisticated similarity measures are standard for time series matching, as they are usually misaligned.  Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines  alignment and matching at the same time.
However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound  based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short  queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds  branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH  (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the  state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching,  shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity  measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH  is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search  with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around  20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.
}
}

Endnote

%0 Conference Paper
%T SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series
%A Chen Luo
%A Anshumali Shrivastava
%B Proceedings of the Time Series Workshop at NIPS 2016
%C Proceedings of Machine Learning Research
%D 2017
%E Oren Anava
%E Azadeh Khaleghi
%E Marco Cuturi
%E Vitaly Kuznetsov
%E Alexander Rakhlin	
%F pmlr-v55-luo16
%I PMLR
%P 38--58
%U https://proceedings.mlr.press/v55/luo16.html
%V 55
%X Similarity search on time series is a frequent operation in large-scale data-driven applications.  Sophisticated similarity measures are standard for time series matching, as they are usually misaligned.  Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines  alignment and matching at the same time.
However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound  based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short  queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds  branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH  (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the  state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching,  shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity  measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH  is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search  with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around  20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.

RIS


TY  - CPAPER
TI  - SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series
AU  - Chen Luo
AU  - Anshumali Shrivastava
BT  - Proceedings of the Time Series Workshop at NIPS 2016
DA  - 2017/02/16
ED  - Oren Anava
ED  - Azadeh Khaleghi
ED  - Marco Cuturi
ED  - Vitaly Kuznetsov
ED  - Alexander Rakhlin	
ID  - pmlr-v55-luo16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 55
SP  - 38
EP  - 58
L1  - http://proceedings.mlr.press/v55/luo16.pdf
UR  - https://proceedings.mlr.press/v55/luo16.html
AB  - Similarity search on time series is a frequent operation in large-scale data-driven applications.  Sophisticated similarity measures are standard for time series matching, as they are usually misaligned.  Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines  alignment and matching at the same time.
However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound  based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short  queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds  branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH  (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the  state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching,  shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity  measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH  is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search  with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around  20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.

ER  -

APA


Luo, C. & Shrivastava, A.. (2017). SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series. Proceedings of the Time Series Workshop at NIPS 2016, in Proceedings of Machine Learning Research 55:38-58 Available from https://proceedings.mlr.press/v55/luo16.html.

Related Material

Download PDF