DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs

Jiahui Liu, Zhenkun Cai, Zhiyong Chen, Minjie Wang
Proceedings of the Third Learning on Graphs Conference, PMLR 269:19:1-19:13, 2025.

Abstract

Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train AT-GNNs on GPUs due to their intricate computation patterns. The execution of AT-GNN operations without kernel fusion results in heavy data movement and significant kernel launch overhead, while fixed thread scheduling in existing GNN kernel fusion strategies leads to sub-optimal performance, redundant computation and unbalanced workload. To address these challenges, we propose a dynamic kernel fusion framework, DF-GNN, for the AT-GNN family. DF-GNN introduces a dynamic bi-level thread scheduling strategy, enabling flexible adjustments to thread scheduling while retaining the benefits of shared memory within the fused kernel. DF-GNN tailors specific thread scheduling for operations in AT-GNNs and considers the performance bottleneck shift caused by the presence of super nodes. Additionally, DF-GNN is integrated with the PyTorch framework for high programmability. Evaluations across diverse GNN models and multiple datasets reveal that DF-GNN surpasses existing GNN kernel optimization works like cuGraph and dgNN, with speedups up to \textdollar 7.0\times\textdollar over the state-of-the-art non-fusion DGL sparse library. Moreover, it achieves an average speedup of \textdollar 2.16\times\textdollar in end-to-end training compared to the popular GNN computing framework DGL.

Cite this Paper


BibTeX
@InProceedings{pmlr-v269-liu25a, title = {DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs}, author = {Liu, Jiahui and Cai, Zhenkun and Chen, Zhiyong and Wang, Minjie}, booktitle = {Proceedings of the Third Learning on Graphs Conference}, pages = {19:1--19:13}, year = {2025}, editor = {Wolf, Guy and Krishnaswamy, Smita}, volume = {269}, series = {Proceedings of Machine Learning Research}, month = {26--29 Nov}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v269/main/assets/liu25a/liu25a.pdf}, url = {https://proceedings.mlr.press/v269/liu25a.html}, abstract = {Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train AT-GNNs on GPUs due to their intricate computation patterns. The execution of AT-GNN operations without kernel fusion results in heavy data movement and significant kernel launch overhead, while fixed thread scheduling in existing GNN kernel fusion strategies leads to sub-optimal performance, redundant computation and unbalanced workload. To address these challenges, we propose a dynamic kernel fusion framework, DF-GNN, for the AT-GNN family. DF-GNN introduces a dynamic bi-level thread scheduling strategy, enabling flexible adjustments to thread scheduling while retaining the benefits of shared memory within the fused kernel. DF-GNN tailors specific thread scheduling for operations in AT-GNNs and considers the performance bottleneck shift caused by the presence of super nodes. Additionally, DF-GNN is integrated with the PyTorch framework for high programmability. Evaluations across diverse GNN models and multiple datasets reveal that DF-GNN surpasses existing GNN kernel optimization works like cuGraph and dgNN, with speedups up to \textdollar 7.0\times\textdollar over the state-of-the-art non-fusion DGL sparse library. Moreover, it achieves an average speedup of \textdollar 2.16\times\textdollar in end-to-end training compared to the popular GNN computing framework DGL.} }
Endnote
%0 Conference Paper %T DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs %A Jiahui Liu %A Zhenkun Cai %A Zhiyong Chen %A Minjie Wang %B Proceedings of the Third Learning on Graphs Conference %C Proceedings of Machine Learning Research %D 2025 %E Guy Wolf %E Smita Krishnaswamy %F pmlr-v269-liu25a %I PMLR %P 19:1--19:13 %U https://proceedings.mlr.press/v269/liu25a.html %V 269 %X Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train AT-GNNs on GPUs due to their intricate computation patterns. The execution of AT-GNN operations without kernel fusion results in heavy data movement and significant kernel launch overhead, while fixed thread scheduling in existing GNN kernel fusion strategies leads to sub-optimal performance, redundant computation and unbalanced workload. To address these challenges, we propose a dynamic kernel fusion framework, DF-GNN, for the AT-GNN family. DF-GNN introduces a dynamic bi-level thread scheduling strategy, enabling flexible adjustments to thread scheduling while retaining the benefits of shared memory within the fused kernel. DF-GNN tailors specific thread scheduling for operations in AT-GNNs and considers the performance bottleneck shift caused by the presence of super nodes. Additionally, DF-GNN is integrated with the PyTorch framework for high programmability. Evaluations across diverse GNN models and multiple datasets reveal that DF-GNN surpasses existing GNN kernel optimization works like cuGraph and dgNN, with speedups up to \textdollar 7.0\times\textdollar over the state-of-the-art non-fusion DGL sparse library. Moreover, it achieves an average speedup of \textdollar 2.16\times\textdollar in end-to-end training compared to the popular GNN computing framework DGL.
APA
Liu, J., Cai, Z., Chen, Z. & Wang, M.. (2025). DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs. Proceedings of the Third Learning on Graphs Conference, in Proceedings of Machine Learning Research 269:19:1-19:13 Available from https://proceedings.mlr.press/v269/liu25a.html.

Related Material