A Scalable Heterogeneous Parallel SOM Based on MPI/CUDA
Proceedings of The 10th Asian Conference on Machine Learning, PMLR 95:264-279, 2018.
Self-Organizing Map (SOM) is a kind of artificial neural network used in unsupervised machine learning, which is widely applied to clustering, dimension reduction and visualization for high-dimensional data, etc. There are two major versions of the training algorithm: original algorithm and batch algorithm. Compared with the original, the batch algorithm has some advantages including faster convergence and less computation, and is suitable for parallelization. However, it is still confronted with the challenge of eficiency in the case of massive data, high-dimensional data or a large-scale map. In this paper, a scalable heterogeneous parallel SOM based on the batch algorithm is proposed which combines process-level and thread-level parallelism by MPI and CUDA. To boost the parallel efficiency on GPUs and make full use of the high floating-point computing capability, we design matrix operations for the the most time-consuming steps, the computation of best match units and weights update, making the steps available for the implementation by cuBLAS. In addition, the memory optimization methods are adopted. The experiments show that the proposed heterogeneous parallel SOM is effective, efficient and scalable.