DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs

Jiahui Liu; Zhenkun Cai; Zhiyong Chen; Minjie Wang

DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs

Jiahui Liu, Zhenkun Cai, Zhiyong Chen, Minjie Wang

Proceedings of the Third Learning on Graphs Conference, PMLR 269:19:1-19:13, 2025.

Abstract

Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train AT-GNNs on GPUs due to their intricate computation patterns. The execution of AT-GNN operations without kernel fusion results in heavy data movement and significant kernel launch overhead, while fixed thread scheduling in existing GNN kernel fusion strategies leads to sub-optimal performance, redundant computation and unbalanced workload. To address these challenges, we propose a dynamic kernel fusion framework, DF-GNN, for the AT-GNN family. DF-GNN introduces a dynamic bi-level thread scheduling strategy, enabling flexible adjustments to thread scheduling while retaining the benefits of shared memory within the fused kernel. DF-GNN tailors specific thread scheduling for operations in AT-GNNs and considers the performance bottleneck shift caused by the presence of super nodes. Additionally, DF-GNN is integrated with the PyTorch framework for high programmability. Evaluations across diverse GNN models and multiple datasets reveal that DF-GNN surpasses existing GNN kernel optimization works like cuGraph and dgNN, with speedups up to \textdollar 7.0\times\textdollar over the state-of-the-art non-fusion DGL sparse library. Moreover, it achieves an average speedup of \textdollar 2.16\times\textdollar in end-to-end training compared to the popular GNN computing framework DGL.

Cite this Paper

BibTeX

@InProceedings{pmlr-v269-liu25a,
  title = 	 {DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs},
  author =       {Liu, Jiahui and Cai, Zhenkun and Chen, Zhiyong and Wang, Minjie},
  booktitle = 	 {Proceedings of the Third Learning on Graphs Conference},
  pages = 	 {19:1--19:13},
  year = 	 {2025},
  editor = 	 {Wolf, Guy and Krishnaswamy, Smita},
  volume = 	 {269},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--29 Nov},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v269/main/assets/liu25a/liu25a.pdf},
  url = 	 {https://proceedings.mlr.press/v269/liu25a.html},
  abstract = 	 {Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train AT-GNNs on GPUs due to their intricate computation patterns. The execution of AT-GNN operations without kernel fusion results in heavy data movement and significant kernel launch overhead, while fixed thread scheduling in existing GNN kernel fusion strategies leads to sub-optimal performance, redundant computation and unbalanced workload. To address these challenges, we propose a dynamic kernel fusion framework, DF-GNN, for the AT-GNN family. DF-GNN introduces a dynamic bi-level thread scheduling strategy, enabling flexible adjustments to thread scheduling while retaining the benefits of shared memory within the fused kernel. DF-GNN tailors specific thread scheduling for operations in AT-GNNs and considers the performance bottleneck shift caused by the presence of super nodes. Additionally, DF-GNN is integrated with the PyTorch framework for high programmability. Evaluations across diverse GNN models and multiple datasets reveal that DF-GNN surpasses existing GNN kernel optimization works like cuGraph and dgNN, with speedups up to \textdollar 7.0\times\textdollar  over the state-of-the-art non-fusion DGL sparse library. Moreover, it achieves an average speedup of \textdollar 2.16\times\textdollar  in end-to-end training compared to the popular GNN computing framework DGL.}
}

Endnote

%0 Conference Paper
%T DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs
%A Jiahui Liu
%A Zhenkun Cai
%A Zhiyong Chen
%A Minjie Wang
%B Proceedings of the Third Learning on Graphs Conference
%C Proceedings of Machine Learning Research
%D 2025
%E Guy Wolf
%E Smita Krishnaswamy	
%F pmlr-v269-liu25a
%I PMLR
%P 19:1--19:13
%U https://proceedings.mlr.press/v269/liu25a.html
%V 269
%X Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train AT-GNNs on GPUs due to their intricate computation patterns. The execution of AT-GNN operations without kernel fusion results in heavy data movement and significant kernel launch overhead, while fixed thread scheduling in existing GNN kernel fusion strategies leads to sub-optimal performance, redundant computation and unbalanced workload. To address these challenges, we propose a dynamic kernel fusion framework, DF-GNN, for the AT-GNN family. DF-GNN introduces a dynamic bi-level thread scheduling strategy, enabling flexible adjustments to thread scheduling while retaining the benefits of shared memory within the fused kernel. DF-GNN tailors specific thread scheduling for operations in AT-GNNs and considers the performance bottleneck shift caused by the presence of super nodes. Additionally, DF-GNN is integrated with the PyTorch framework for high programmability. Evaluations across diverse GNN models and multiple datasets reveal that DF-GNN surpasses existing GNN kernel optimization works like cuGraph and dgNN, with speedups up to \textdollar 7.0\times\textdollar  over the state-of-the-art non-fusion DGL sparse library. Moreover, it achieves an average speedup of \textdollar 2.16\times\textdollar  in end-to-end training compared to the popular GNN computing framework DGL.

APA

Liu, J., Cai, Z., Chen, Z. & Wang, M.. (2025). DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs. Proceedings of the Third Learning on Graphs Conference, in Proceedings of Machine Learning Research 269:19:1-19:13 Available from https://proceedings.mlr.press/v269/liu25a.html.

DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs

Abstract

Cite this Paper

Related Material