[edit]
Robustness of Spectral Methods for Community Detection
Proceedings of the Thirty-Second Conference on Learning Theory, PMLR 99:2831-2860, 2019.
Abstract
The present work is concerned with community detection. Specifically, we consider a random graph drawn according to the stochastic block model: its vertex set is partitioned into blocks, or communities, and edges are placed randomly and independently of each other with probability depending only on the communities of their two endpoints. In this context, our aim is to recover the community labels better than by random guess, based only on the observation of the graph. In the sparse case, where edge probabilities are in $O(1/n)$, we introduce a new spectral method based on the distance matrix $D^{(\ell)}$, where $D^{(\ell)}_{ij} = 1$ iff the graph distance between $i$ and $j$, noted $d(i, j)$ is equal to $\ell$. We show that when $\ell \sim c\log(n)$ for carefully chosen $c$, the eigenvectors associated to the largest eigenvalues of $D^{(\ell)}$ provide enough information to perform non-trivial community recovery with high probability, provided we are above the so-called Kesten-Stigum threshold. This yields an efficient algorithm for community detection, since computation of the matrix $D^{(\ell)}$ can be done in $O(n^{1+\kappa})$ operations for a small constant $\kappa$. We then study the sensitivity of the eigendecomposition of $D^{(\ell)}$ when we allow an adversarial perturbation of the edges of $G$. We show that when the considered perturbation does not affect more than $O(n^\varepsilon)$ vertices for some small $\varepsilon > 0$, the highest eigenvalues and their corresponding eigenvectors incur negligible perturbations, which allows us to still perform efficient recovery. Our proposed spectral method therefore: i) is robust to larger perturbations than prior spectral methods, while semi-definite programming (or SDP) methods can tolerate yet larger perturbations; ii) achieves non-trivial detection down to the KS threshold, which is conjectured to be optimal and is beyond reach of existing SDP approaches; iii) is faster than SDP approaches.