SANGEA: Scalable and Attributed Network Generation

Valentin Lemaire, Youssef Achenchabe, Lucas Ody, Houssem Eddine Souid, Gianmarco Aversano, Nicolas Posocco, Sabri Skhiri
Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:678-693, 2024.

Abstract

The topic of synthetic graph generators (SGGs) has recently received much attention due to the wave of the latest breakthroughs in generative modelling. However, many state-of-the-art SGGs do not scale well with the graph size. Indeed, in the generation process, all the possible edges for a fixed number of nodes must often be considered, which scales in $\mathcal{O}(N^2)$, with $N$ being the number of nodes in the graph. For this reason, many state-of-the-art SGGs are not applicable to large graphs. In this paper, we present SANGEA, a sizeable synthetic graph generation framework which extends the applicability of any SGG to large graphs. By first splitting the large graph into communities, SANGEA trains one SGG per community, then links the community graphs back together to create a synthetic large graph. Our experiments show that the graphs generated by SANGEA have high similarity to the original graph, in terms of both topology and node feature distribution. Additionally, these generated graphs achieve high utility on downstream tasks such as link prediction. Finally, we provide a privacy assessment of the generated graphs to show that, even though they have excellent utility, they also achieve reasonable privacy scores.

Cite this Paper


BibTeX
@InProceedings{pmlr-v222-lemaire24a, title = {{SANGEA}: {S}calable and Attributed Network Generation}, author = {Lemaire, Valentin and Achenchabe, Youssef and Ody, Lucas and Souid, Houssem Eddine and Aversano, Gianmarco and Posocco, Nicolas and Skhiri, Sabri}, booktitle = {Proceedings of the 15th Asian Conference on Machine Learning}, pages = {678--693}, year = {2024}, editor = {Yanıkoğlu, Berrin and Buntine, Wray}, volume = {222}, series = {Proceedings of Machine Learning Research}, month = {11--14 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v222/lemaire24a/lemaire24a.pdf}, url = {https://proceedings.mlr.press/v222/lemaire24a.html}, abstract = {The topic of synthetic graph generators (SGGs) has recently received much attention due to the wave of the latest breakthroughs in generative modelling. However, many state-of-the-art SGGs do not scale well with the graph size. Indeed, in the generation process, all the possible edges for a fixed number of nodes must often be considered, which scales in $\mathcal{O}(N^2)$, with $N$ being the number of nodes in the graph. For this reason, many state-of-the-art SGGs are not applicable to large graphs. In this paper, we present SANGEA, a sizeable synthetic graph generation framework which extends the applicability of any SGG to large graphs. By first splitting the large graph into communities, SANGEA trains one SGG per community, then links the community graphs back together to create a synthetic large graph. Our experiments show that the graphs generated by SANGEA have high similarity to the original graph, in terms of both topology and node feature distribution. Additionally, these generated graphs achieve high utility on downstream tasks such as link prediction. Finally, we provide a privacy assessment of the generated graphs to show that, even though they have excellent utility, they also achieve reasonable privacy scores.} }
Endnote
%0 Conference Paper %T SANGEA: Scalable and Attributed Network Generation %A Valentin Lemaire %A Youssef Achenchabe %A Lucas Ody %A Houssem Eddine Souid %A Gianmarco Aversano %A Nicolas Posocco %A Sabri Skhiri %B Proceedings of the 15th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Berrin Yanıkoğlu %E Wray Buntine %F pmlr-v222-lemaire24a %I PMLR %P 678--693 %U https://proceedings.mlr.press/v222/lemaire24a.html %V 222 %X The topic of synthetic graph generators (SGGs) has recently received much attention due to the wave of the latest breakthroughs in generative modelling. However, many state-of-the-art SGGs do not scale well with the graph size. Indeed, in the generation process, all the possible edges for a fixed number of nodes must often be considered, which scales in $\mathcal{O}(N^2)$, with $N$ being the number of nodes in the graph. For this reason, many state-of-the-art SGGs are not applicable to large graphs. In this paper, we present SANGEA, a sizeable synthetic graph generation framework which extends the applicability of any SGG to large graphs. By first splitting the large graph into communities, SANGEA trains one SGG per community, then links the community graphs back together to create a synthetic large graph. Our experiments show that the graphs generated by SANGEA have high similarity to the original graph, in terms of both topology and node feature distribution. Additionally, these generated graphs achieve high utility on downstream tasks such as link prediction. Finally, we provide a privacy assessment of the generated graphs to show that, even though they have excellent utility, they also achieve reasonable privacy scores.
APA
Lemaire, V., Achenchabe, Y., Ody, L., Souid, H.E., Aversano, G., Posocco, N. & Skhiri, S.. (2024). SANGEA: Scalable and Attributed Network Generation. Proceedings of the 15th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 222:678-693 Available from https://proceedings.mlr.press/v222/lemaire24a.html.

Related Material