Towards Optimal Execution of Density-based Clustering on Heterogeneous Hardware

Dirk Habich, Stefanie Gahrig, Wolfgang Lehner
; Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, PMLR 36:104-119, 2014.

Abstract

Data Clustering is an important and highly utilized data mining technique in various application domains. With ever increasing data volumes in the era of big data, the efficient execution of clustering algorithms is a fundamental prerequisite to gain understanding and acquire novel, previously unknown knowledge from data. To establish an efficient execution, the clustering algorithms have to be re-engineered to fully exploit the provided hardware capabilities. Shared-memory multiprocessor systems like graphics processing units (GPUs) provide extremely high parallelism combined with a high bandwidth transfer at low cost. The availability of such computing units increases with upcoming processors, where a common CPU and various computing units, like GPU, are tightly coupled using a unified shared memory hierarchy. In this paper, we consider density-based clustering for such heterogeneous systems. In particular, we optimize the configuration of CUDA-DClust – a density-based clustering algorithm for GPUs – and show that our configuration approach enables an efficient and deterministic execution. Our configuration approach is based on data as well as hardware properties, so that we are able to adjust the algorithm execution in both directions. In our evaluation, we show the applicability of our approach and present open challenges which have to be solved next.

Cite this Paper


BibTeX
@InProceedings{pmlr-v36-habich14, title = {Towards Optimal Execution of Density-based Clustering on Heterogeneous Hardware}, author = {Dirk Habich and Stefanie Gahrig and Wolfgang Lehner}, booktitle = {Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications}, pages = {104--119}, year = {2014}, editor = {Wei Fan and Albert Bifet and Qiang Yang and Philip S. Yu}, volume = {36}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {24 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v36/habich14.pdf}, url = {http://proceedings.mlr.press/v36/habich14.html}, abstract = {Data Clustering is an important and highly utilized data mining technique in various application domains. With ever increasing data volumes in the era of big data, the efficient execution of clustering algorithms is a fundamental prerequisite to gain understanding and acquire novel, previously unknown knowledge from data. To establish an efficient execution, the clustering algorithms have to be re-engineered to fully exploit the provided hardware capabilities. Shared-memory multiprocessor systems like graphics processing units (GPUs) provide extremely high parallelism combined with a high bandwidth transfer at low cost. The availability of such computing units increases with upcoming processors, where a common CPU and various computing units, like GPU, are tightly coupled using a unified shared memory hierarchy. In this paper, we consider density-based clustering for such heterogeneous systems. In particular, we optimize the configuration of CUDA-DClust – a density-based clustering algorithm for GPUs – and show that our configuration approach enables an efficient and deterministic execution. Our configuration approach is based on data as well as hardware properties, so that we are able to adjust the algorithm execution in both directions. In our evaluation, we show the applicability of our approach and present open challenges which have to be solved next.} }
Endnote
%0 Conference Paper %T Towards Optimal Execution of Density-based Clustering on Heterogeneous Hardware %A Dirk Habich %A Stefanie Gahrig %A Wolfgang Lehner %B Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications %C Proceedings of Machine Learning Research %D 2014 %E Wei Fan %E Albert Bifet %E Qiang Yang %E Philip S. Yu %F pmlr-v36-habich14 %I PMLR %J Proceedings of Machine Learning Research %P 104--119 %U http://proceedings.mlr.press %V 36 %W PMLR %X Data Clustering is an important and highly utilized data mining technique in various application domains. With ever increasing data volumes in the era of big data, the efficient execution of clustering algorithms is a fundamental prerequisite to gain understanding and acquire novel, previously unknown knowledge from data. To establish an efficient execution, the clustering algorithms have to be re-engineered to fully exploit the provided hardware capabilities. Shared-memory multiprocessor systems like graphics processing units (GPUs) provide extremely high parallelism combined with a high bandwidth transfer at low cost. The availability of such computing units increases with upcoming processors, where a common CPU and various computing units, like GPU, are tightly coupled using a unified shared memory hierarchy. In this paper, we consider density-based clustering for such heterogeneous systems. In particular, we optimize the configuration of CUDA-DClust – a density-based clustering algorithm for GPUs – and show that our configuration approach enables an efficient and deterministic execution. Our configuration approach is based on data as well as hardware properties, so that we are able to adjust the algorithm execution in both directions. In our evaluation, we show the applicability of our approach and present open challenges which have to be solved next.
RIS
TY - CPAPER TI - Towards Optimal Execution of Density-based Clustering on Heterogeneous Hardware AU - Dirk Habich AU - Stefanie Gahrig AU - Wolfgang Lehner BT - Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications PY - 2014/08/13 DA - 2014/08/13 ED - Wei Fan ED - Albert Bifet ED - Qiang Yang ED - Philip S. Yu ID - pmlr-v36-habich14 PB - PMLR SP - 104 DP - PMLR EP - 119 L1 - http://proceedings.mlr.press/v36/habich14.pdf UR - http://proceedings.mlr.press/v36/habich14.html AB - Data Clustering is an important and highly utilized data mining technique in various application domains. With ever increasing data volumes in the era of big data, the efficient execution of clustering algorithms is a fundamental prerequisite to gain understanding and acquire novel, previously unknown knowledge from data. To establish an efficient execution, the clustering algorithms have to be re-engineered to fully exploit the provided hardware capabilities. Shared-memory multiprocessor systems like graphics processing units (GPUs) provide extremely high parallelism combined with a high bandwidth transfer at low cost. The availability of such computing units increases with upcoming processors, where a common CPU and various computing units, like GPU, are tightly coupled using a unified shared memory hierarchy. In this paper, we consider density-based clustering for such heterogeneous systems. In particular, we optimize the configuration of CUDA-DClust – a density-based clustering algorithm for GPUs – and show that our configuration approach enables an efficient and deterministic execution. Our configuration approach is based on data as well as hardware properties, so that we are able to adjust the algorithm execution in both directions. In our evaluation, we show the applicability of our approach and present open challenges which have to be solved next. ER -
APA
Habich, D., Gahrig, S. & Lehner, W.. (2014). Towards Optimal Execution of Density-based Clustering on Heterogeneous Hardware. Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, in PMLR 36:104-119

Related Material