Towards Understanding Parametric Generalized Category Discovery on Graphs

Bowen Deng, Lele Fu, Jialong Chen, Sheng Huang, Tianchi Liao, Zhang Tao, Chuan Chen
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:13069-13109, 2025.

Abstract

Generalized Category Discovery (GCD) aims to identify both known and novel categories in unlabeled data by leveraging knowledge from old classes. However, existing methods are limited to non-graph data; lack theoretical foundations to answer When and how known classes can help GCD. We introduce the Graph GCD task; provide the first rigorous theoretical analysis of parametric GCD. By quantifying the relationship between old and new classes in the embedding space using the Wasserstein distance W, we derive the first provable GCD loss bound based on W. This analysis highlights two necessary conditions for effective GCD. However, we uncover, through a Pairwise Markov Random Field perspective, that popular graph contrastive learning (GCL) methods inherently violate these conditions. To address this limitation, we propose SWIRL, a novel GCL method for GCD. Experimental results validate our (theoretical) findings and demonstrate SWIRL’s effectiveness.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-deng25b, title = {Towards Understanding Parametric Generalized Category Discovery on Graphs}, author = {Deng, Bowen and Fu, Lele and Chen, Jialong and Huang, Sheng and Liao, Tianchi and Tao, Zhang and Chen, Chuan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {13069--13109}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/deng25b/deng25b.pdf}, url = {https://proceedings.mlr.press/v267/deng25b.html}, abstract = {Generalized Category Discovery (GCD) aims to identify both known and novel categories in unlabeled data by leveraging knowledge from old classes. However, existing methods are limited to non-graph data; lack theoretical foundations to answer When and how known classes can help GCD. We introduce the Graph GCD task; provide the first rigorous theoretical analysis of parametric GCD. By quantifying the relationship between old and new classes in the embedding space using the Wasserstein distance W, we derive the first provable GCD loss bound based on W. This analysis highlights two necessary conditions for effective GCD. However, we uncover, through a Pairwise Markov Random Field perspective, that popular graph contrastive learning (GCL) methods inherently violate these conditions. To address this limitation, we propose SWIRL, a novel GCL method for GCD. Experimental results validate our (theoretical) findings and demonstrate SWIRL’s effectiveness.} }
Endnote
%0 Conference Paper %T Towards Understanding Parametric Generalized Category Discovery on Graphs %A Bowen Deng %A Lele Fu %A Jialong Chen %A Sheng Huang %A Tianchi Liao %A Zhang Tao %A Chuan Chen %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-deng25b %I PMLR %P 13069--13109 %U https://proceedings.mlr.press/v267/deng25b.html %V 267 %X Generalized Category Discovery (GCD) aims to identify both known and novel categories in unlabeled data by leveraging knowledge from old classes. However, existing methods are limited to non-graph data; lack theoretical foundations to answer When and how known classes can help GCD. We introduce the Graph GCD task; provide the first rigorous theoretical analysis of parametric GCD. By quantifying the relationship between old and new classes in the embedding space using the Wasserstein distance W, we derive the first provable GCD loss bound based on W. This analysis highlights two necessary conditions for effective GCD. However, we uncover, through a Pairwise Markov Random Field perspective, that popular graph contrastive learning (GCL) methods inherently violate these conditions. To address this limitation, we propose SWIRL, a novel GCL method for GCD. Experimental results validate our (theoretical) findings and demonstrate SWIRL’s effectiveness.
APA
Deng, B., Fu, L., Chen, J., Huang, S., Liao, T., Tao, Z. & Chen, C.. (2025). Towards Understanding Parametric Generalized Category Discovery on Graphs. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:13069-13109 Available from https://proceedings.mlr.press/v267/deng25b.html.

Related Material