[edit]
Towards Understanding Parametric Generalized Category Discovery on Graphs
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:13069-13109, 2025.
Abstract
Generalized Category Discovery (GCD) aims to identify both known and novel categories in unlabeled data by leveraging knowledge from old classes. However, existing methods are limited to non-graph data; lack theoretical foundations to answer When and how known classes can help GCD. We introduce the Graph GCD task; provide the first rigorous theoretical analysis of parametric GCD. By quantifying the relationship between old and new classes in the embedding space using the Wasserstein distance W, we derive the first provable GCD loss bound based on W. This analysis highlights two necessary conditions for effective GCD. However, we uncover, through a Pairwise Markov Random Field perspective, that popular graph contrastive learning (GCL) methods inherently violate these conditions. To address this limitation, we propose SWIRL, a novel GCL method for GCD. Experimental results validate our (theoretical) findings and demonstrate SWIRL’s effectiveness.