Towards Optimal Execution of Density-based Clustering on Heterogeneous Hardware

Dirk Habich; Stefanie Gahrig; Wolfgang Lehner

Towards Optimal Execution of Density-based Clustering on Heterogeneous Hardware

Dirk Habich, Stefanie Gahrig, Wolfgang Lehner

Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, PMLR 36:104-119, 2014.

Abstract

Data Clustering is an important and highly utilized data mining technique in various application domains. With ever increasing data volumes in the era of big data, the efficient execution of clustering algorithms is a fundamental prerequisite to gain understanding and acquire novel, previously unknown knowledge from data. To establish an efficient execution, the clustering algorithms have to be re-engineered to fully exploit the provided hardware capabilities. Shared-memory multiprocessor systems like graphics processing units (GPUs) provide extremely high parallelism combined with a high bandwidth transfer at low cost. The availability of such computing units increases with upcoming processors, where a common CPU and various computing units, like GPU, are tightly coupled using a unified shared memory hierarchy. In this paper, we consider density-based clustering for such heterogeneous systems. In particular, we optimize the configuration of CUDA-DClust – a density-based clustering algorithm for GPUs – and show that our configuration approach enables an efficient and deterministic execution. Our configuration approach is based on data as well as hardware properties, so that we are able to adjust the algorithm execution in both directions. In our evaluation, we show the applicability of our approach and present open challenges which have to be solved next.

Cite this Paper

BibTeX


@InProceedings{pmlr-v36-habich14,
  title = 	 {Towards Optimal Execution of Density-based Clustering on Heterogeneous Hardware},
  author = 	 {Habich, Dirk and Gahrig, Stefanie and Lehner, Wolfgang},
  booktitle = 	 {Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications},
  pages = 	 {104--119},
  year = 	 {2014},
  editor = 	 {Fan, Wei and Bifet, Albert and Yang, Qiang and Yu, Philip S.},
  volume = 	 {36},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {24 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v36/habich14.pdf},
  url = 	 {https://proceedings.mlr.press/v36/habich14.html},
  abstract = 	 {Data Clustering is an important and highly utilized data mining technique in various application domains. With ever increasing data volumes in the era of big data, the efficient execution of clustering algorithms is a fundamental prerequisite to gain understanding and acquire novel, previously unknown knowledge from data. To establish an efficient execution, the clustering algorithms have to be re-engineered to fully exploit the provided hardware capabilities. Shared-memory multiprocessor systems like graphics processing units (GPUs) provide extremely high parallelism combined with a high bandwidth transfer at low cost. The availability of such computing units increases with upcoming processors, where a common CPU and various computing units, like GPU, are tightly coupled using a unified shared memory hierarchy. In this paper, we consider density-based clustering for such heterogeneous systems. In particular, we optimize the configuration of CUDA-DClust – a density-based clustering algorithm for GPUs – and show that our configuration approach enables an efficient and deterministic execution. Our configuration approach is based on data as well as hardware properties, so that we are able to adjust the algorithm execution in both directions. In our evaluation, we show the applicability of our approach and present open challenges which have to be solved next.}
}

Endnote

%0 Conference Paper
%T Towards Optimal Execution of Density-based Clustering on Heterogeneous Hardware
%A Dirk Habich
%A Stefanie Gahrig
%A Wolfgang Lehner
%B Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
%C Proceedings of Machine Learning Research
%D 2014
%E Wei Fan
%E Albert Bifet
%E Qiang Yang
%E Philip S. Yu	
%F pmlr-v36-habich14
%I PMLR
%P 104--119
%U https://proceedings.mlr.press/v36/habich14.html
%V 36
%X Data Clustering is an important and highly utilized data mining technique in various application domains. With ever increasing data volumes in the era of big data, the efficient execution of clustering algorithms is a fundamental prerequisite to gain understanding and acquire novel, previously unknown knowledge from data. To establish an efficient execution, the clustering algorithms have to be re-engineered to fully exploit the provided hardware capabilities. Shared-memory multiprocessor systems like graphics processing units (GPUs) provide extremely high parallelism combined with a high bandwidth transfer at low cost. The availability of such computing units increases with upcoming processors, where a common CPU and various computing units, like GPU, are tightly coupled using a unified shared memory hierarchy. In this paper, we consider density-based clustering for such heterogeneous systems. In particular, we optimize the configuration of CUDA-DClust – a density-based clustering algorithm for GPUs – and show that our configuration approach enables an efficient and deterministic execution. Our configuration approach is based on data as well as hardware properties, so that we are able to adjust the algorithm execution in both directions. In our evaluation, we show the applicability of our approach and present open challenges which have to be solved next.

RIS


TY  - CPAPER
TI  - Towards Optimal Execution of Density-based Clustering on Heterogeneous Hardware
AU  - Dirk Habich
AU  - Stefanie Gahrig
AU  - Wolfgang Lehner
BT  - Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
DA  - 2014/08/13
ED  - Wei Fan
ED  - Albert Bifet
ED  - Qiang Yang
ED  - Philip S. Yu	
ID  - pmlr-v36-habich14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 36
SP  - 104
EP  - 119
L1  - http://proceedings.mlr.press/v36/habich14.pdf
UR  - https://proceedings.mlr.press/v36/habich14.html
AB  - Data Clustering is an important and highly utilized data mining technique in various application domains. With ever increasing data volumes in the era of big data, the efficient execution of clustering algorithms is a fundamental prerequisite to gain understanding and acquire novel, previously unknown knowledge from data. To establish an efficient execution, the clustering algorithms have to be re-engineered to fully exploit the provided hardware capabilities. Shared-memory multiprocessor systems like graphics processing units (GPUs) provide extremely high parallelism combined with a high bandwidth transfer at low cost. The availability of such computing units increases with upcoming processors, where a common CPU and various computing units, like GPU, are tightly coupled using a unified shared memory hierarchy. In this paper, we consider density-based clustering for such heterogeneous systems. In particular, we optimize the configuration of CUDA-DClust – a density-based clustering algorithm for GPUs – and show that our configuration approach enables an efficient and deterministic execution. Our configuration approach is based on data as well as hardware properties, so that we are able to adjust the algorithm execution in both directions. In our evaluation, we show the applicability of our approach and present open challenges which have to be solved next.
ER  -

APA


Habich, D., Gahrig, S. & Lehner, W.. (2014). Towards Optimal Execution of Density-based Clustering on Heterogeneous Hardware. Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, in Proceedings of Machine Learning Research 36:104-119 Available from https://proceedings.mlr.press/v36/habich14.html.

Related Material

Download PDF