Why Steiner-tree type algorithms work for community detection
Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, PMLR 31:187-195, 2013.
We consider the problem of reconstructing a specific connected community S ⊂V in a graph G = (V, E), where each node v is associated with a signal whose strength grows with the likelihood that v belongs to S. This problem appears in social or protein interaction network, the latter also referred to as the signaling pathway reconstruction problem. We study this community reconstruction problem under several natural generative models, and make the following two contributions. First, in the context of social networks, where the signals are modeled as bounded-supported random variables, we design an efficient algorithm for recovering most members in S with well-controlled false positive overhead, by utilizing the network structure for a large family of “homogeneous” generative models. This positive result is complemented by an information theoretic lower bound for the case where the network structure is unknown or the network is heterogeneous. Second, we consider the case in which the graph represents the protein interaction network, in which it is customary to consider signals that have unbounded support, we generalize our first contribution to give the first theoretical justification of why existing Steiner-tree type heuristics work well in practice.