A Scalable Heterogeneous Parallel SOM Based on MPI/CUDA

Yao Liu, Jun Sun, Qing Yao, Su Wang, Kai Zheng, Yan Liu
Proceedings of The 10th Asian Conference on Machine Learning, PMLR 95:264-279, 2018.

Abstract

Self-Organizing Map (SOM) is a kind of artificial neural network used in unsupervised machine learning, which is widely applied to clustering, dimension reduction and visualization for high-dimensional data, etc. There are two major versions of the training algorithm: original algorithm and batch algorithm. Compared with the original, the batch algorithm has some advantages including faster convergence and less computation, and is suitable for parallelization. However, it is still confronted with the challenge of eficiency in the case of massive data, high-dimensional data or a large-scale map. In this paper, a scalable heterogeneous parallel SOM based on the batch algorithm is proposed which combines process-level and thread-level parallelism by MPI and CUDA. To boost the parallel efficiency on GPUs and make full use of the high floating-point computing capability, we design matrix operations for the the most time-consuming steps, the computation of best match units and weights update, making the steps available for the implementation by cuBLAS. In addition, the memory optimization methods are adopted. The experiments show that the proposed heterogeneous parallel SOM is effective, efficient and scalable.

Cite this Paper


BibTeX
@InProceedings{pmlr-v95-liu18b, title = {A Scalable Heterogeneous Parallel SOM Based on MPI/CUDA}, author = {Liu, Yao and Sun, Jun and Yao, Qing and Wang, Su and Zheng, Kai and Liu, Yan}, booktitle = {Proceedings of The 10th Asian Conference on Machine Learning}, pages = {264--279}, year = {2018}, editor = {Zhu, Jun and Takeuchi, Ichiro}, volume = {95}, series = {Proceedings of Machine Learning Research}, month = {14--16 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v95/liu18b/liu18b.pdf}, url = {https://proceedings.mlr.press/v95/liu18b.html}, abstract = {Self-Organizing Map (SOM) is a kind of artificial neural network used in unsupervised machine learning, which is widely applied to clustering, dimension reduction and visualization for high-dimensional data, etc. There are two major versions of the training algorithm: original algorithm and batch algorithm. Compared with the original, the batch algorithm has some advantages including faster convergence and less computation, and is suitable for parallelization. However, it is still confronted with the challenge of eficiency in the case of massive data, high-dimensional data or a large-scale map. In this paper, a scalable heterogeneous parallel SOM based on the batch algorithm is proposed which combines process-level and thread-level parallelism by MPI and CUDA. To boost the parallel efficiency on GPUs and make full use of the high floating-point computing capability, we design matrix operations for the the most time-consuming steps, the computation of best match units and weights update, making the steps available for the implementation by cuBLAS. In addition, the memory optimization methods are adopted. The experiments show that the proposed heterogeneous parallel SOM is effective, efficient and scalable.} }
Endnote
%0 Conference Paper %T A Scalable Heterogeneous Parallel SOM Based on MPI/CUDA %A Yao Liu %A Jun Sun %A Qing Yao %A Su Wang %A Kai Zheng %A Yan Liu %B Proceedings of The 10th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2018 %E Jun Zhu %E Ichiro Takeuchi %F pmlr-v95-liu18b %I PMLR %P 264--279 %U https://proceedings.mlr.press/v95/liu18b.html %V 95 %X Self-Organizing Map (SOM) is a kind of artificial neural network used in unsupervised machine learning, which is widely applied to clustering, dimension reduction and visualization for high-dimensional data, etc. There are two major versions of the training algorithm: original algorithm and batch algorithm. Compared with the original, the batch algorithm has some advantages including faster convergence and less computation, and is suitable for parallelization. However, it is still confronted with the challenge of eficiency in the case of massive data, high-dimensional data or a large-scale map. In this paper, a scalable heterogeneous parallel SOM based on the batch algorithm is proposed which combines process-level and thread-level parallelism by MPI and CUDA. To boost the parallel efficiency on GPUs and make full use of the high floating-point computing capability, we design matrix operations for the the most time-consuming steps, the computation of best match units and weights update, making the steps available for the implementation by cuBLAS. In addition, the memory optimization methods are adopted. The experiments show that the proposed heterogeneous parallel SOM is effective, efficient and scalable.
APA
Liu, Y., Sun, J., Yao, Q., Wang, S., Zheng, K. & Liu, Y.. (2018). A Scalable Heterogeneous Parallel SOM Based on MPI/CUDA. Proceedings of The 10th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 95:264-279 Available from https://proceedings.mlr.press/v95/liu18b.html.

Related Material