HetGSMOTE: Oversampling for Heterogeneous Graphs

Adhilsha Ansad, Deependra Singh, Rucha Bhalchandra Joshi, Subhankar Mishra
Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), PMLR 307:1-14, 2026.

Abstract

Graph Neural Networks (GNNs) have proven effective for learning from graph structured data, with heterogeneous graphs (HetGs) gaining particular prominence for their ability to model diverse real world systems through multiple node and edge types. However, class imbalance where certain node classes are significantly underrepresented presents a critical challenge for node classification tasks on HetGs, as traditional learning approaches fail to adequately handle minority classes. This work introduces HetGSMOTE, a novel oversampling framework that extends SMOTE-based techniques to heterogeneous graph settings by systematically incorporating node-type, edge-type, and metapath information into the synthetic sample generation process. HetGSMOTE operates by constructing a content-aggregated and neighbor-type-aggregated embedding space through a base model, then generating synthetic minority nodes while training specialized edge generators for each node type to preserve essential relational structures. Through comprehensive experiments across multiple benchmark datasets and base models, we demonstrate that HetGSMOTE consistently outperforms existing baseline methods, achieving substantial improvements in classification performance under various imbalance scenarios, particularly in extreme imbalance cases while maintaining broad compatibility across different heterogeneous graph neural network architectures. We release our code and data preparations at [github.com/smlab-niser/hetgsmote](https://github.com/smlab-niser/hetgsmote).

Cite this Paper


BibTeX
@InProceedings{pmlr-v307-ansad26a, title = {Het{GSMOTE}: Oversampling for Heterogeneous Graphs}, author = {Ansad, Adhilsha and Singh, Deependra and Joshi, Rucha Bhalchandra and Mishra, Subhankar}, booktitle = {Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL)}, pages = {1--14}, year = {2026}, editor = {Kim, Hyeongji and Ramírez Rivera, Adín and Ricaud, Benjamin}, volume = {307}, series = {Proceedings of Machine Learning Research}, month = {06--08 Jan}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v307/main/assets/ansad26a/ansad26a.pdf}, url = {https://proceedings.mlr.press/v307/ansad26a.html}, abstract = {Graph Neural Networks (GNNs) have proven effective for learning from graph structured data, with heterogeneous graphs (HetGs) gaining particular prominence for their ability to model diverse real world systems through multiple node and edge types. However, class imbalance where certain node classes are significantly underrepresented presents a critical challenge for node classification tasks on HetGs, as traditional learning approaches fail to adequately handle minority classes. This work introduces HetGSMOTE, a novel oversampling framework that extends SMOTE-based techniques to heterogeneous graph settings by systematically incorporating node-type, edge-type, and metapath information into the synthetic sample generation process. HetGSMOTE operates by constructing a content-aggregated and neighbor-type-aggregated embedding space through a base model, then generating synthetic minority nodes while training specialized edge generators for each node type to preserve essential relational structures. Through comprehensive experiments across multiple benchmark datasets and base models, we demonstrate that HetGSMOTE consistently outperforms existing baseline methods, achieving substantial improvements in classification performance under various imbalance scenarios, particularly in extreme imbalance cases while maintaining broad compatibility across different heterogeneous graph neural network architectures. We release our code and data preparations at [github.com/smlab-niser/hetgsmote](https://github.com/smlab-niser/hetgsmote).} }
Endnote
%0 Conference Paper %T HetGSMOTE: Oversampling for Heterogeneous Graphs %A Adhilsha Ansad %A Deependra Singh %A Rucha Bhalchandra Joshi %A Subhankar Mishra %B Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL) %C Proceedings of Machine Learning Research %D 2026 %E Hyeongji Kim %E Adín Ramírez Rivera %E Benjamin Ricaud %F pmlr-v307-ansad26a %I PMLR %P 1--14 %U https://proceedings.mlr.press/v307/ansad26a.html %V 307 %X Graph Neural Networks (GNNs) have proven effective for learning from graph structured data, with heterogeneous graphs (HetGs) gaining particular prominence for their ability to model diverse real world systems through multiple node and edge types. However, class imbalance where certain node classes are significantly underrepresented presents a critical challenge for node classification tasks on HetGs, as traditional learning approaches fail to adequately handle minority classes. This work introduces HetGSMOTE, a novel oversampling framework that extends SMOTE-based techniques to heterogeneous graph settings by systematically incorporating node-type, edge-type, and metapath information into the synthetic sample generation process. HetGSMOTE operates by constructing a content-aggregated and neighbor-type-aggregated embedding space through a base model, then generating synthetic minority nodes while training specialized edge generators for each node type to preserve essential relational structures. Through comprehensive experiments across multiple benchmark datasets and base models, we demonstrate that HetGSMOTE consistently outperforms existing baseline methods, achieving substantial improvements in classification performance under various imbalance scenarios, particularly in extreme imbalance cases while maintaining broad compatibility across different heterogeneous graph neural network architectures. We release our code and data preparations at [github.com/smlab-niser/hetgsmote](https://github.com/smlab-niser/hetgsmote).
APA
Ansad, A., Singh, D., Joshi, R.B. & Mishra, S.. (2026). HetGSMOTE: Oversampling for Heterogeneous Graphs. Proceedings of the 7th Northern Lights Deep Learning Conference (NLDL), in Proceedings of Machine Learning Research 307:1-14 Available from https://proceedings.mlr.press/v307/ansad26a.html.

Related Material