Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree

Zhinoos Razavi Hesabi; Timos Sellis; Xiuzhen Zhang

Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree

Zhinoos Razavi Hesabi, Timos Sellis, Xiuzhen Zhang

Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, PMLR 41:19-32, 2015.

Abstract

With the advancement of data generation technologies such as sensor networks, multiple data streams are continuously generated. Clustering multiple data streams is challenging as the requirement of clustering at anytime becomes more critical. We aim to cluster multiple data streams concurrently and in this paper we report our work in progress. ClusTree is an anytime clustering algorithm for a single stream. It uses a hierarchical tree structure to index micro-clusters, which are summary statistics for streaming data objects. We design a dynamic, concurrent indexing tree structure that extends the ClusTree structure to achieve more granular micro clusters (summaries) of multiple streams at any time. We devised algorithms to search, expand and update the hierarchical tree structure of storing micro clusters concurrently, along with an algorithm for anytime concurrent clustering of multiple streams. As this is work in progress, we plan to test our proposed algorithms, on sensor data sets, and evaluate the space and time complexity of creating and accessing micro-clusters. We will also evaluate the quality of clustering in terms of number of created clusters and compare our technique with other approaches.

Cite this Paper

BibTeX


@InProceedings{pmlr-v41-razavi15,
  title = 	 {Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree},
  author = 	 {Razavi Hesabi, Zhinoos and Sellis, Timos and Zhang, Xiuzhen},
  booktitle = 	 {Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications},
  pages = 	 {19--32},
  year = 	 {2015},
  editor = 	 {Fan, Wei and Bifet, Albert and Yang, Qiang and Yu, Philip S.},
  volume = 	 {41},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v41/razavi15.pdf},
  url = 	 {https://proceedings.mlr.press/v41/razavi15.html},
  abstract = 	 {With the advancement of data generation technologies such as sensor networks, multiple data streams are continuously generated.  Clustering multiple data streams is challenging as the requirement of clustering at anytime becomes more critical. We aim to cluster multiple data streams concurrently and in this paper we report our work in progress. ClusTree is an anytime clustering algorithm for a single stream. It uses a hierarchical tree structure to index micro-clusters, which are summary statistics for streaming data objects. We design a dynamic, concurrent indexing tree structure that extends the ClusTree structure  to achieve more granular micro clusters (summaries) of multiple streams  at any time. We devised algorithms to search, expand and update the hierarchical tree structure of storing micro clusters concurrently, along with an algorithm for anytime concurrent clustering of multiple streams.  As this is work in progress, we plan to test our proposed algorithms, on sensor data sets, and evaluate the space and time complexity of creating and accessing micro-clusters. We will also evaluate the quality of clustering in terms of number of created clusters and compare our technique with other approaches. }
}

Endnote

%0 Conference Paper
%T Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree
%A Zhinoos Razavi Hesabi
%A Timos Sellis
%A Xiuzhen Zhang
%B Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
%C Proceedings of Machine Learning Research
%D 2015
%E Wei Fan
%E Albert Bifet
%E Qiang Yang
%E Philip S. Yu	
%F pmlr-v41-razavi15
%I PMLR
%P 19--32
%U https://proceedings.mlr.press/v41/razavi15.html
%V 41
%X With the advancement of data generation technologies such as sensor networks, multiple data streams are continuously generated.  Clustering multiple data streams is challenging as the requirement of clustering at anytime becomes more critical. We aim to cluster multiple data streams concurrently and in this paper we report our work in progress. ClusTree is an anytime clustering algorithm for a single stream. It uses a hierarchical tree structure to index micro-clusters, which are summary statistics for streaming data objects. We design a dynamic, concurrent indexing tree structure that extends the ClusTree structure  to achieve more granular micro clusters (summaries) of multiple streams  at any time. We devised algorithms to search, expand and update the hierarchical tree structure of storing micro clusters concurrently, along with an algorithm for anytime concurrent clustering of multiple streams.  As this is work in progress, we plan to test our proposed algorithms, on sensor data sets, and evaluate the space and time complexity of creating and accessing micro-clusters. We will also evaluate the quality of clustering in terms of number of created clusters and compare our technique with other approaches.

RIS


TY  - CPAPER
TI  - Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree
AU  - Zhinoos Razavi Hesabi
AU  - Timos Sellis
AU  - Xiuzhen Zhang
BT  - Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
DA  - 2015/08/31
ED  - Wei Fan
ED  - Albert Bifet
ED  - Qiang Yang
ED  - Philip S. Yu	
ID  - pmlr-v41-razavi15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 41
SP  - 19
EP  - 32
L1  - http://proceedings.mlr.press/v41/razavi15.pdf
UR  - https://proceedings.mlr.press/v41/razavi15.html
AB  - With the advancement of data generation technologies such as sensor networks, multiple data streams are continuously generated.  Clustering multiple data streams is challenging as the requirement of clustering at anytime becomes more critical. We aim to cluster multiple data streams concurrently and in this paper we report our work in progress. ClusTree is an anytime clustering algorithm for a single stream. It uses a hierarchical tree structure to index micro-clusters, which are summary statistics for streaming data objects. We design a dynamic, concurrent indexing tree structure that extends the ClusTree structure  to achieve more granular micro clusters (summaries) of multiple streams  at any time. We devised algorithms to search, expand and update the hierarchical tree structure of storing micro clusters concurrently, along with an algorithm for anytime concurrent clustering of multiple streams.  As this is work in progress, we plan to test our proposed algorithms, on sensor data sets, and evaluate the space and time complexity of creating and accessing micro-clusters. We will also evaluate the quality of clustering in terms of number of created clusters and compare our technique with other approaches. 
ER  -

APA


Razavi Hesabi, Z., Sellis, T. & Zhang, X.. (2015). Anytime Concurrent Clustering of Multiple Streams with an Indexing Tree. Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, in Proceedings of Machine Learning Research 41:19-32 Available from https://proceedings.mlr.press/v41/razavi15.html.

Related Material

Download PDF