[edit]
Community Recovery in the Degree-Heterogeneous Stochastic Block Model
Proceedings of Thirty Fifth Conference on Learning Theory, PMLR 178:1662-1692, 2022.
Abstract
We consider the problem of recovering communities in a random directed graph with planted communities. To model real-world directed graphs such as the Twitter or Instagram graphs that exhibit very heterogeneous degree sequences, we introduce the Degree-Heterogeneous Stochastic Block Model (DHSBM), a generalization of the classic Stochastic Block Model (SBM), where the vertex set is partitioned into communities and each vertex u has two (unknown) associated probabilities, pu and qu, pu>qu. An arc from u to v is generated with probability pu if u and v are in the same community and with probability qu otherwise. Given a graph generated from this model, the goal is to retrieve the communities. The DHSBM allows to generate graphs with planted communities while allowing heterogeneous degree distributions, a quite important feature of real-world networks. In the case where there are two communities, we present an iterative greedy linear-time algorithm that recovers them whenever min. We also show that, up to a constant, this condition is necessary. Our results also extend to the standard (undirected) SBM, where p_u = p and q_u= q for all nodes u. Our algorithm presents the first linear-time algorithm that recovers exactly the communities at the asymptotic information-theoretic threshold, improving over previous near-linear time spectral approaches.