[edit]
Consistent recovery threshold of hidden nearest neighbor graphs
Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:1540-1553, 2020.
Abstract
Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden 2k-nearest neighbor (NN) graph in an n-vertex complete graph, whose edge weights are independent and distributed according to Pn for edges in the hidden 2k-NN graph and Qn otherwise. The special case of Bernoulli distributions corresponds to a variant of the Watts-Strogatz small-world graph. We focus on two types of asymptotic recovery guarantees as n→∞: (1) exact recovery: all edges are classified correctly with probability tending to one; (2) almost exact recovery: the expected number of misclassified edges is o(nk). We show that the maximum likelihood estimator achieves (1) exact recovery for 2≤k≤no(1) if lim inf; (2) almost exact recovery for 1 \le k \le o\left( \frac{\log n}{\log \log n} \right) if \liminf \frac{kD(P_n||Q_n)}{\log n}>1, where \alpha_n \triangleq -2 \log \int \sqrt{d P_n d Q_n} is the Rényi divergence of order \frac{1}{2} and D(P_n||Q_n) is the Kullback-Leibler divergence. Under mild distributional assumptions, these conditions are shown to be information-theoretically necessary for any algorithm to succeed. A key challenge in the analysis is the enumeration of 2k-NN graphs that differ from the hidden one by a given number of edges. We also analyze several computationally efficient algorithms and provide sufficient conditions under which they achieve exact/almost exact recovery. In particular, we develop a polynomial-time algorithm that attains the threshold for exact recovery under the small-world model.