Adaptive Stream Clustering Using Incremental Graph Maintenance

Marwan Hassani; Pascal Spaus; Alfredo Cuzzocrea; Thomas Seidl

Adaptive Stream Clustering Using Incremental Graph Maintenance

Marwan Hassani, Pascal Spaus, Alfredo Cuzzocrea, Thomas Seidl

Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, PMLR 41:49-64, 2015.

Abstract

Challenges for clustering streaming data are getting continuously more sophisticated. This trend is driven by the the emerging requirements of the application where those algorithms are used and the properties of the stream itself. Some of these properties are the continuous data arrival, the time-critical processing of objects, the evolution of the data streams, the presence of outliers and the varying densities of the data. Due to the fact that the stream evolves continuously in the process of its existence, it is crucial that the algorithm autonomously detects clusters of arbitrary shape, with different densities, and varying number of clusters. Recently, the first hierarchical density-based stream clustering algorithm based on cluster stability, called HASTREAM, was proposed. Although the algorithm was able to meet the above mentioned requirements, it inherited the main drawback of density-based hierarchical clustering algorithms, namely the efficiency issues. In this paper we propose \textitI-HASTREAM, a first density-based hierarchical clustering algorithm that has considerably less computational time than HASTREAM. Our proposed method utilizes and introduces techniques from the graph theory domain to devise an incremental update of the underlying model instead of repeatedly performing the expensive calculations of the huge graph. Specifically the Prim’s algorithm for constructing the minimal spanning tree is adopted by introducing novel, incremental maintenance of the tree by vertex and edge insertion and deletion. The extensive experimental evaluation study on real world datasets shows that I-HASTREAM is considerably faster than a state-of-the-art hierarchical density-based stream clustering approach while delivering almost the same clustering quality.

Cite this Paper

BibTeX


@InProceedings{pmlr-v41-hassani15,
  title = 	 {Adaptive Stream Clustering Using Incremental Graph Maintenance},
  author = 	 {Hassani, Marwan and Spaus, Pascal and Cuzzocrea, Alfredo and Seidl, Thomas},
  booktitle = 	 {Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications},
  pages = 	 {49--64},
  year = 	 {2015},
  editor = 	 {Fan, Wei and Bifet, Albert and Yang, Qiang and Yu, Philip S.},
  volume = 	 {41},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {10 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v41/hassani15.pdf},
  url = 	 {https://proceedings.mlr.press/v41/hassani15.html},
  abstract = 	 {Challenges for clustering streaming data are getting continuously more sophisticated.  This trend is driven by the the emerging requirements of the application where those algorithms are used and the properties of the stream itself. Some of these properties are the continuous data arrival, the time-critical processing of objects, the evolution of the data streams, the presence  of outliers and the varying densities of the data. Due to the fact that the stream evolves continuously  in the process of its existence, it is crucial that the algorithm autonomously detects clusters of  arbitrary shape, with different densities, and varying number of clusters. Recently, the first hierarchical density-based stream clustering algorithm based on cluster stability,  called HASTREAM, was proposed. Although the algorithm was able to meet the above mentioned requirements,  it inherited the main drawback of density-based hierarchical clustering algorithms, namely the efficiency issues. In this paper we propose \textitI-HASTREAM, a first density-based hierarchical clustering algorithm that has considerably less computational time than HASTREAM. Our proposed method utilizes and introduces techniques from  the graph theory domain to devise an incremental update of the underlying model instead of repeatedly performing  the expensive calculations of the huge graph. Specifically the Prim’s algorithm for constructing the minimal spanning tree is adopted by introducing novel, incremental maintenance of the tree by vertex and edge insertion and deletion. The extensive experimental evaluation study on real world datasets shows that I-HASTREAM is considerably faster  than a state-of-the-art hierarchical density-based stream clustering approach while delivering almost the same clustering quality.}
}

Endnote

%0 Conference Paper
%T Adaptive Stream Clustering Using Incremental Graph Maintenance
%A Marwan Hassani
%A Pascal Spaus
%A Alfredo Cuzzocrea
%A Thomas Seidl
%B Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
%C Proceedings of Machine Learning Research
%D 2015
%E Wei Fan
%E Albert Bifet
%E Qiang Yang
%E Philip S. Yu	
%F pmlr-v41-hassani15
%I PMLR
%P 49--64
%U https://proceedings.mlr.press/v41/hassani15.html
%V 41
%X Challenges for clustering streaming data are getting continuously more sophisticated.  This trend is driven by the the emerging requirements of the application where those algorithms are used and the properties of the stream itself. Some of these properties are the continuous data arrival, the time-critical processing of objects, the evolution of the data streams, the presence  of outliers and the varying densities of the data. Due to the fact that the stream evolves continuously  in the process of its existence, it is crucial that the algorithm autonomously detects clusters of  arbitrary shape, with different densities, and varying number of clusters. Recently, the first hierarchical density-based stream clustering algorithm based on cluster stability,  called HASTREAM, was proposed. Although the algorithm was able to meet the above mentioned requirements,  it inherited the main drawback of density-based hierarchical clustering algorithms, namely the efficiency issues. In this paper we propose \textitI-HASTREAM, a first density-based hierarchical clustering algorithm that has considerably less computational time than HASTREAM. Our proposed method utilizes and introduces techniques from  the graph theory domain to devise an incremental update of the underlying model instead of repeatedly performing  the expensive calculations of the huge graph. Specifically the Prim’s algorithm for constructing the minimal spanning tree is adopted by introducing novel, incremental maintenance of the tree by vertex and edge insertion and deletion. The extensive experimental evaluation study on real world datasets shows that I-HASTREAM is considerably faster  than a state-of-the-art hierarchical density-based stream clustering approach while delivering almost the same clustering quality.

RIS


TY  - CPAPER
TI  - Adaptive Stream Clustering Using Incremental Graph Maintenance
AU  - Marwan Hassani
AU  - Pascal Spaus
AU  - Alfredo Cuzzocrea
AU  - Thomas Seidl
BT  - Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
DA  - 2015/08/31
ED  - Wei Fan
ED  - Albert Bifet
ED  - Qiang Yang
ED  - Philip S. Yu	
ID  - pmlr-v41-hassani15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 41
SP  - 49
EP  - 64
L1  - http://proceedings.mlr.press/v41/hassani15.pdf
UR  - https://proceedings.mlr.press/v41/hassani15.html
AB  - Challenges for clustering streaming data are getting continuously more sophisticated.  This trend is driven by the the emerging requirements of the application where those algorithms are used and the properties of the stream itself. Some of these properties are the continuous data arrival, the time-critical processing of objects, the evolution of the data streams, the presence  of outliers and the varying densities of the data. Due to the fact that the stream evolves continuously  in the process of its existence, it is crucial that the algorithm autonomously detects clusters of  arbitrary shape, with different densities, and varying number of clusters. Recently, the first hierarchical density-based stream clustering algorithm based on cluster stability,  called HASTREAM, was proposed. Although the algorithm was able to meet the above mentioned requirements,  it inherited the main drawback of density-based hierarchical clustering algorithms, namely the efficiency issues. In this paper we propose \textitI-HASTREAM, a first density-based hierarchical clustering algorithm that has considerably less computational time than HASTREAM. Our proposed method utilizes and introduces techniques from  the graph theory domain to devise an incremental update of the underlying model instead of repeatedly performing  the expensive calculations of the huge graph. Specifically the Prim’s algorithm for constructing the minimal spanning tree is adopted by introducing novel, incremental maintenance of the tree by vertex and edge insertion and deletion. The extensive experimental evaluation study on real world datasets shows that I-HASTREAM is considerably faster  than a state-of-the-art hierarchical density-based stream clustering approach while delivering almost the same clustering quality.
ER  -

APA


Hassani, M., Spaus, P., Cuzzocrea, A. & Seidl, T.. (2015). Adaptive Stream Clustering Using Incremental Graph Maintenance. Proceedings of the 4th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, in Proceedings of Machine Learning Research 41:49-64 Available from https://proceedings.mlr.press/v41/hassani15.html.

Related Material

Download PDF