Community Recovery in the Degree-Heterogeneous Stochastic Block Model

Vincent Cohen-Addad, Frederik Mallmann-Trenn, David Saulpic
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:1662-1692, 2022.

Abstract

We consider the problem of recovering communities in a random directed graph with planted communities. To model real-world directed graphs such as the Twitter or Instagram graphs that exhibit very heterogeneous degree sequences, we introduce the Degree-Heterogeneous Stochastic Block Model (DHSBM), a generalization of the classic Stochastic Block Model (SBM), where the vertex set is partitioned into communities and each vertex $u$ has two (unknown) associated probabilities, $p_u$ and $q_u$, $p_u > q_u$. An arc from $u$ to $v$ is generated with probability $p_u$ if $u$ and $v$ are in the same community and with probability $q_u$ otherwise. Given a graph generated from this model, the goal is to retrieve the communities. The DHSBM allows to generate graphs with planted communities while allowing heterogeneous degree distributions, a quite important feature of real-world networks. In the case where there are two communities, we present an iterative greedy linear-time algorithm that recovers them whenever $\min_u \frac{p_u - q_u}{\sqrt{p_u}} = \Omega(\sqrt{\log (n)/n})$. We also show that, up to a constant, this condition is necessary. Our results also extend to the standard (undirected) SBM, where $p_u = p$ and $q_u= q$ for all nodes $u$. Our algorithm presents the first linear-time algorithm that recovers exactly the communities at the asymptotic information-theoretic threshold, improving over previous near-linear time spectral approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v178-cohen-addad22a, title = {Community Recovery in the Degree-Heterogeneous Stochastic Block Model}, author = {Cohen-Addad, Vincent and Mallmann-Trenn, Frederik and Saulpic, David}, booktitle = {Proceedings of Thirty Fifth Conference on Learning Theory}, pages = {1662--1692}, year = {2022}, editor = {Loh, Po-Ling and Raginsky, Maxim}, volume = {178}, series = {Proceedings of Machine Learning Research}, month = {02--05 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v178/cohen-addad22a/cohen-addad22a.pdf}, url = {https://proceedings.mlr.press/v178/cohen-addad22a.html}, abstract = {We consider the problem of recovering communities in a random directed graph with planted communities. To model real-world directed graphs such as the Twitter or Instagram graphs that exhibit very heterogeneous degree sequences, we introduce the Degree-Heterogeneous Stochastic Block Model (DHSBM), a generalization of the classic Stochastic Block Model (SBM), where the vertex set is partitioned into communities and each vertex $u$ has two (unknown) associated probabilities, $p_u$ and $q_u$, $p_u > q_u$. An arc from $u$ to $v$ is generated with probability $p_u$ if $u$ and $v$ are in the same community and with probability $q_u$ otherwise. Given a graph generated from this model, the goal is to retrieve the communities. The DHSBM allows to generate graphs with planted communities while allowing heterogeneous degree distributions, a quite important feature of real-world networks. In the case where there are two communities, we present an iterative greedy linear-time algorithm that recovers them whenever $\min_u \frac{p_u - q_u}{\sqrt{p_u}} = \Omega(\sqrt{\log (n)/n})$. We also show that, up to a constant, this condition is necessary. Our results also extend to the standard (undirected) SBM, where $p_u = p$ and $q_u= q$ for all nodes $u$. Our algorithm presents the first linear-time algorithm that recovers exactly the communities at the asymptotic information-theoretic threshold, improving over previous near-linear time spectral approaches.} }
Endnote
%0 Conference Paper %T Community Recovery in the Degree-Heterogeneous Stochastic Block Model %A Vincent Cohen-Addad %A Frederik Mallmann-Trenn %A David Saulpic %B Proceedings of Thirty Fifth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2022 %E Po-Ling Loh %E Maxim Raginsky %F pmlr-v178-cohen-addad22a %I PMLR %P 1662--1692 %U https://proceedings.mlr.press/v178/cohen-addad22a.html %V 178 %X We consider the problem of recovering communities in a random directed graph with planted communities. To model real-world directed graphs such as the Twitter or Instagram graphs that exhibit very heterogeneous degree sequences, we introduce the Degree-Heterogeneous Stochastic Block Model (DHSBM), a generalization of the classic Stochastic Block Model (SBM), where the vertex set is partitioned into communities and each vertex $u$ has two (unknown) associated probabilities, $p_u$ and $q_u$, $p_u > q_u$. An arc from $u$ to $v$ is generated with probability $p_u$ if $u$ and $v$ are in the same community and with probability $q_u$ otherwise. Given a graph generated from this model, the goal is to retrieve the communities. The DHSBM allows to generate graphs with planted communities while allowing heterogeneous degree distributions, a quite important feature of real-world networks. In the case where there are two communities, we present an iterative greedy linear-time algorithm that recovers them whenever $\min_u \frac{p_u - q_u}{\sqrt{p_u}} = \Omega(\sqrt{\log (n)/n})$. We also show that, up to a constant, this condition is necessary. Our results also extend to the standard (undirected) SBM, where $p_u = p$ and $q_u= q$ for all nodes $u$. Our algorithm presents the first linear-time algorithm that recovers exactly the communities at the asymptotic information-theoretic threshold, improving over previous near-linear time spectral approaches.
APA
Cohen-Addad, V., Mallmann-Trenn, F. & Saulpic, D.. (2022). Community Recovery in the Degree-Heterogeneous Stochastic Block Model. Proceedings of Thirty Fifth Conference on Learning Theory, in Proceedings of Machine Learning Research 178:1662-1692 Available from https://proceedings.mlr.press/v178/cohen-addad22a.html.

Related Material