RSC: Accelerate Graph Neural Networks Training via Randomized Sparse Computations

Zirui Liu; Chen Shengyuan; Kaixiong Zhou; Daochen Zha; Xiao Huang; Xia Hu

RSC: Accelerate Graph Neural Networks Training via Randomized Sparse Computations

Zirui Liu, Chen Shengyuan, Kaixiong Zhou, Daochen Zha, Xiao Huang, Xia Hu

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:21951-21968, 2023.

Abstract

Training graph neural networks (GNNs) is extremely time consuming because sparse graph-based operations are hard to be accelerated by community hardware. Prior art successfully reduces the computation cost of dense matrix based operations (e.g., convolution and linear) via sampling-based approximation. However, unlike dense matrices, sparse matrices are stored in the irregular data format such that each row/column may have different number of non-zero entries. Thus, compared to the dense counterpart, approximating sparse operations has two unique challenges (1) we cannot directly control the efficiency of approximated sparse operation since the computation is only executed on non-zero entries; (2) sampling sparse matrices is much more inefficient due to the irregular data format. To address the issues, our key idea is to control the accuracy-efficiency trade off by optimizing computation resource allocation layer-wisely and epoch-wisely. For the first challenge, we customize the computation resource to different sparse operations, while limit the total used resource below a certain budget. For the second challenge, we cache previous sampled sparse matrices to reduce the epoch-wise sampling overhead. Finally, we propose a switching mechanisms to improve the generalization of GNNs trained with approximated operations. To this end, we propose Randomized Sparse Computation. In practice, rsc can achieve up to 11.6X speedup for a single sparse operation and 1.6X end-to-end wall-clock time speedup with almost no accuracy drop.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-liu23ad,
  title = 	 {{RSC}: Accelerate Graph Neural Networks Training via Randomized Sparse Computations},
  author =       {Liu, Zirui and Shengyuan, Chen and Zhou, Kaixiong and Zha, Daochen and Huang, Xiao and Hu, Xia},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {21951--21968},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/liu23ad/liu23ad.pdf},
  url = 	 {https://proceedings.mlr.press/v202/liu23ad.html},
  abstract = 	 {Training graph neural networks (GNNs) is extremely time consuming because sparse graph-based operations are hard to be accelerated by community hardware. Prior art successfully reduces the computation cost of dense matrix based operations (e.g., convolution and linear) via sampling-based approximation. However, unlike dense matrices, sparse matrices are stored in the irregular data format such that each row/column may have different number of non-zero entries. Thus, compared to the dense counterpart, approximating sparse operations has two unique challenges (1) we cannot directly control the efficiency of approximated sparse operation since the computation is only executed on non-zero entries; (2) sampling sparse matrices is much more inefficient due to the irregular data format. To address the issues, our key idea is to control the accuracy-efficiency trade off by optimizing computation resource allocation layer-wisely and epoch-wisely. For the first challenge, we customize the computation resource to different sparse operations, while limit the total used resource below a certain budget. For the second challenge, we cache previous sampled sparse matrices to reduce the epoch-wise sampling overhead. Finally, we propose a switching mechanisms to improve the generalization of GNNs trained with approximated operations. To this end, we propose Randomized Sparse Computation. In practice, rsc can achieve up to 11.6X speedup for a single sparse operation and 1.6X end-to-end wall-clock time speedup with almost no accuracy drop.}
}

Endnote

%0 Conference Paper
%T RSC: Accelerate Graph Neural Networks Training via Randomized Sparse Computations
%A Zirui Liu
%A Chen Shengyuan
%A Kaixiong Zhou
%A Daochen Zha
%A Xiao Huang
%A Xia Hu
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-liu23ad
%I PMLR
%P 21951--21968
%U https://proceedings.mlr.press/v202/liu23ad.html
%V 202
%X Training graph neural networks (GNNs) is extremely time consuming because sparse graph-based operations are hard to be accelerated by community hardware. Prior art successfully reduces the computation cost of dense matrix based operations (e.g., convolution and linear) via sampling-based approximation. However, unlike dense matrices, sparse matrices are stored in the irregular data format such that each row/column may have different number of non-zero entries. Thus, compared to the dense counterpart, approximating sparse operations has two unique challenges (1) we cannot directly control the efficiency of approximated sparse operation since the computation is only executed on non-zero entries; (2) sampling sparse matrices is much more inefficient due to the irregular data format. To address the issues, our key idea is to control the accuracy-efficiency trade off by optimizing computation resource allocation layer-wisely and epoch-wisely. For the first challenge, we customize the computation resource to different sparse operations, while limit the total used resource below a certain budget. For the second challenge, we cache previous sampled sparse matrices to reduce the epoch-wise sampling overhead. Finally, we propose a switching mechanisms to improve the generalization of GNNs trained with approximated operations. To this end, we propose Randomized Sparse Computation. In practice, rsc can achieve up to 11.6X speedup for a single sparse operation and 1.6X end-to-end wall-clock time speedup with almost no accuracy drop.

APA


Liu, Z., Shengyuan, C., Zhou, K., Zha, D., Huang, X. & Hu, X.. (2023). RSC: Accelerate Graph Neural Networks Training via Randomized Sparse Computations. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:21951-21968 Available from https://proceedings.mlr.press/v202/liu23ad.html.

RSC: Accelerate Graph Neural Networks Training via Randomized Sparse Computations

Abstract

Cite this Paper

Related Material