Data-Centric Graph Condensation via Diffusion Matching

Hengrui Zhang; Philip S. Yu

Data-Centric Graph Condensation via Diffusion Matching

Hengrui Zhang, Philip S. Yu

Proceedings of the 17th Asian Conference on Machine Learning, PMLR 304:447-462, 2025.

Abstract

This paper introduces Data-Centric Graph Condensation (named DCGC), a task- and model-agnostic method for condensing a large graph into a smaller one by matching the distribution between two graphs. DCGC defines the distribution of a graph as the trajectories of its node signals (such as node features and node labels) induced by a diffusion process over the geometric structure, which accommodates multi-order structural information. Built upon this, DCGC compresses the topological knowledge of the original graph into the orders-of-magnitude smaller synthetic one by aligning their distributions in input space. Compared with existing methods that stick to particular GNN architectures and require solving complicated optimization, DCGC can be flexibly applied to arbitrary off-the-shelf GNNs and achieve graph condensation with a much faster speed. Apart from the cross-architecture generalization ability and training efficiency, experiments demonstrate that DCGC yields consistently superior performance than existing methods on datasets with varying scales and condensation ratios.

Cite this Paper

BibTeX

@InProceedings{pmlr-v304-zhang25b,
  title = 	 {Data-Centric Graph Condensation via Diffusion Matching},
  author =       {Zhang, Hengrui and Yu, Philip S.},
  booktitle = 	 {Proceedings of the 17th Asian Conference on Machine Learning},
  pages = 	 {447--462},
  year = 	 {2025},
  editor = 	 {Lee, Hung-yi and Liu, Tongliang},
  volume = 	 {304},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--12 Dec},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v304/main/assets/zhang25b/zhang25b.pdf},
  url = 	 {https://proceedings.mlr.press/v304/zhang25b.html},
  abstract = 	 {This paper introduces Data-Centric Graph Condensation (named DCGC), a task- and model-agnostic method for condensing a large graph into a smaller one by matching the distribution between two graphs. DCGC defines the distribution of a graph as the trajectories of its node signals (such as node features and node labels) induced by a diffusion process over the geometric structure, which accommodates multi-order structural information. Built upon this, DCGC compresses the topological knowledge of the original graph into the orders-of-magnitude smaller synthetic one by aligning their distributions in input space. Compared with existing methods that stick to particular GNN architectures and require solving complicated optimization, DCGC can be flexibly applied to arbitrary off-the-shelf GNNs and achieve graph condensation with a much faster speed. Apart from the cross-architecture generalization ability and training efficiency, experiments demonstrate that DCGC yields consistently superior performance than existing methods on datasets with varying scales and condensation ratios.}
}

Endnote

%0 Conference Paper
%T Data-Centric Graph Condensation via Diffusion Matching
%A Hengrui Zhang
%A Philip S. Yu
%B Proceedings of the 17th Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Hung-yi Lee
%E Tongliang Liu	
%F pmlr-v304-zhang25b
%I PMLR
%P 447--462
%U https://proceedings.mlr.press/v304/zhang25b.html
%V 304
%X This paper introduces Data-Centric Graph Condensation (named DCGC), a task- and model-agnostic method for condensing a large graph into a smaller one by matching the distribution between two graphs. DCGC defines the distribution of a graph as the trajectories of its node signals (such as node features and node labels) induced by a diffusion process over the geometric structure, which accommodates multi-order structural information. Built upon this, DCGC compresses the topological knowledge of the original graph into the orders-of-magnitude smaller synthetic one by aligning their distributions in input space. Compared with existing methods that stick to particular GNN architectures and require solving complicated optimization, DCGC can be flexibly applied to arbitrary off-the-shelf GNNs and achieve graph condensation with a much faster speed. Apart from the cross-architecture generalization ability and training efficiency, experiments demonstrate that DCGC yields consistently superior performance than existing methods on datasets with varying scales and condensation ratios.

APA

Zhang, H. & Yu, P.S.. (2025). Data-Centric Graph Condensation via Diffusion Matching. Proceedings of the 17th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 304:447-462 Available from https://proceedings.mlr.press/v304/zhang25b.html.

Related Material

Download PDF