FAQ: A Framework for Fast Approximate Query Processing on Temporal Data

Udayan Khurana, Srinivasan Parthasarathy, Deepak Turaga
; Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, PMLR 36:29-45, 2014.

Abstract

Temporal queries on time evolving data are at the heart of a broad range of business and network intelligence applications ranging from consumer behavior analysis, trend analysis, temporal pattern mining, sentiment analysis on social media, cyber security, and network monitoring. In this work, we present an innovative data structure called Fast Approximate Query-able(FAQ) which provides a unified framework for temporal query processing on Big Data. FAQ uses a novel composition of data sketching, wavelet-style differencing for temporal compression, and quantization, and handles diverse kinds of queries including distinct counts, set membership, frequency estimation, top-K, p-norms, empirical entropy, and distance queries such as Histogram \ell_p-norm distance (including Euclidean and Manhattan distance), cosine similarity, Jaccard coefficient, and rank correlation. Experiments on a real-life multi dimensional network monitoring data sets demonstrate speedups of 92x achieved by FAQ over a flat representation of data for a mixed temporal query workload.

Cite this Paper


BibTeX
@InProceedings{pmlr-v36-khurana14, title = {FAQ: A Framework for Fast Approximate Query Processing on Temporal Data}, author = {Udayan Khurana and Srinivasan Parthasarathy and Deepak Turaga}, booktitle = {Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications}, pages = {29--45}, year = {2014}, editor = {Wei Fan and Albert Bifet and Qiang Yang and Philip S. Yu}, volume = {36}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {24 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v36/khurana14.pdf}, url = {http://proceedings.mlr.press/v36/khurana14.html}, abstract = {Temporal queries on time evolving data are at the heart of a broad range of business and network intelligence applications ranging from consumer behavior analysis, trend analysis, temporal pattern mining, sentiment analysis on social media, cyber security, and network monitoring. In this work, we present an innovative data structure called Fast Approximate Query-able(FAQ) which provides a unified framework for temporal query processing on Big Data. FAQ uses a novel composition of data sketching, wavelet-style differencing for temporal compression, and quantization, and handles diverse kinds of queries including distinct counts, set membership, frequency estimation, top-K, p-norms, empirical entropy, and distance queries such as Histogram \ell_p-norm distance (including Euclidean and Manhattan distance), cosine similarity, Jaccard coefficient, and rank correlation. Experiments on a real-life multi dimensional network monitoring data sets demonstrate speedups of 92x achieved by FAQ over a flat representation of data for a mixed temporal query workload.} }
Endnote
%0 Conference Paper %T FAQ: A Framework for Fast Approximate Query Processing on Temporal Data %A Udayan Khurana %A Srinivasan Parthasarathy %A Deepak Turaga %B Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications %C Proceedings of Machine Learning Research %D 2014 %E Wei Fan %E Albert Bifet %E Qiang Yang %E Philip S. Yu %F pmlr-v36-khurana14 %I PMLR %J Proceedings of Machine Learning Research %P 29--45 %U http://proceedings.mlr.press %V 36 %W PMLR %X Temporal queries on time evolving data are at the heart of a broad range of business and network intelligence applications ranging from consumer behavior analysis, trend analysis, temporal pattern mining, sentiment analysis on social media, cyber security, and network monitoring. In this work, we present an innovative data structure called Fast Approximate Query-able(FAQ) which provides a unified framework for temporal query processing on Big Data. FAQ uses a novel composition of data sketching, wavelet-style differencing for temporal compression, and quantization, and handles diverse kinds of queries including distinct counts, set membership, frequency estimation, top-K, p-norms, empirical entropy, and distance queries such as Histogram \ell_p-norm distance (including Euclidean and Manhattan distance), cosine similarity, Jaccard coefficient, and rank correlation. Experiments on a real-life multi dimensional network monitoring data sets demonstrate speedups of 92x achieved by FAQ over a flat representation of data for a mixed temporal query workload.
RIS
TY - CPAPER TI - FAQ: A Framework for Fast Approximate Query Processing on Temporal Data AU - Udayan Khurana AU - Srinivasan Parthasarathy AU - Deepak Turaga BT - Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications PY - 2014/08/13 DA - 2014/08/13 ED - Wei Fan ED - Albert Bifet ED - Qiang Yang ED - Philip S. Yu ID - pmlr-v36-khurana14 PB - PMLR SP - 29 DP - PMLR EP - 45 L1 - http://proceedings.mlr.press/v36/khurana14.pdf UR - http://proceedings.mlr.press/v36/khurana14.html AB - Temporal queries on time evolving data are at the heart of a broad range of business and network intelligence applications ranging from consumer behavior analysis, trend analysis, temporal pattern mining, sentiment analysis on social media, cyber security, and network monitoring. In this work, we present an innovative data structure called Fast Approximate Query-able(FAQ) which provides a unified framework for temporal query processing on Big Data. FAQ uses a novel composition of data sketching, wavelet-style differencing for temporal compression, and quantization, and handles diverse kinds of queries including distinct counts, set membership, frequency estimation, top-K, p-norms, empirical entropy, and distance queries such as Histogram \ell_p-norm distance (including Euclidean and Manhattan distance), cosine similarity, Jaccard coefficient, and rank correlation. Experiments on a real-life multi dimensional network monitoring data sets demonstrate speedups of 92x achieved by FAQ over a flat representation of data for a mixed temporal query workload. ER -
APA
Khurana, U., Parthasarathy, S. & Turaga, D.. (2014). FAQ: A Framework for Fast Approximate Query Processing on Temporal Data. Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, in PMLR 36:29-45

Related Material