Consistent recovery threshold of hidden nearest neighbor graphs

Jian Ding; Yihong Wu; Jiaming Xu; Dana Yang

Consistent recovery threshold of hidden nearest neighbor graphs

Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang

Proceedings of Thirty Third Conference on Learning Theory, PMLR 125:1540-1553, 2020.

Abstract

Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden

$2k$ -nearest neighbor (NN) graph in an

$n$ -vertex complete graph, whose edge weights are independent and distributed according to

$P_n$ for edges in the hidden

$2k$ -NN graph and

$Q_n$ otherwise. The special case of Bernoulli distributions corresponds to a variant of the Watts-Strogatz small-world graph. We focus on two types of asymptotic recovery guarantees as

$n\to \infty$ : (1) exact recovery: all edges are classified correctly with probability tending to one; (2) almost exact recovery: the expected number of misclassified edges is

$o(nk)$ . We show that the maximum likelihood estimator achieves (1) exact recovery for

$2 \le k \le n^{o(1)}$ if

$\liminf \frac{2\alpha_n}{\log n}>1$ ; (2) almost exact recovery for

$1 \le k \le o\left( \frac{\log n}{\log \log n} \right)$ if

$\liminf \frac{kD(P_n||Q_n)}{\log n}>1,$ where

$\alpha_n \triangleq -2 \log \int \sqrt{d P_n d Q_n}$ is the Rényi divergence of order

$\frac{1}{2}$ and

$D(P_n||Q_n)$ is the Kullback-Leibler divergence. Under mild distributional assumptions, these conditions are shown to be information-theoretically necessary for any algorithm to succeed. A key challenge in the analysis is the enumeration of

$2k$ -NN graphs that differ from the hidden one by a given number of edges. We also analyze several computationally efficient algorithms and provide sufficient conditions under which they achieve exact/almost exact recovery. In particular, we develop a polynomial-time algorithm that attains the threshold for exact recovery under the small-world model.

Cite this Paper

BibTeX


@InProceedings{pmlr-v125-ding20a,
  title = 	 {Consistent recovery threshold of hidden nearest neighbor graphs},
  author =       {Ding, Jian and Wu, Yihong and Xu, Jiaming and Yang, Dana},
  booktitle = 	 {Proceedings of Thirty Third Conference on Learning Theory},
  pages = 	 {1540--1553},
  year = 	 {2020},
  editor = 	 {Abernethy, Jacob and Agarwal, Shivani},
  volume = 	 {125},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {09--12 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v125/ding20a/ding20a.pdf},
  url = 	 {https://proceedings.mlr.press/v125/ding20a.html},
  abstract = 	 { Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. The special case of Bernoulli distributions corresponds to a variant of the Watts-Strogatz small-world graph. We focus on two types of asymptotic recovery guarantees as $n\to \infty$: (1) exact recovery: all edges are classified correctly with probability tending to one; (2) almost exact recovery: the expected number of misclassified edges is $o(nk)$. We show that the maximum likelihood estimator achieves (1) exact recovery for $2 \le k \le n^{o(1)}$ if $ \liminf \frac{2\alpha_n}{\log n}>1$; (2) almost exact recovery for $ 1 \le k \le o\left( \frac{\log n}{\log \log n} \right)$ if $ \liminf \frac{kD(P_n||Q_n)}{\log n}>1, $ where $\alpha_n \triangleq -2 \log \int \sqrt{d P_n d Q_n}$ is the Rényi divergence of order $\frac{1}{2}$ and $D(P_n||Q_n)$ is the Kullback-Leibler divergence. Under mild distributional assumptions, these conditions are shown to be information-theoretically necessary for any algorithm to succeed. A key challenge in the analysis is the enumeration of $2k$-NN graphs that differ from the hidden one by a given number of edges. We also analyze several computationally efficient algorithms and provide sufficient conditions under which they achieve exact/almost exact recovery. In particular, we develop a polynomial-time algorithm that attains the threshold for exact recovery under the small-world model.}
}

Endnote

%0 Conference Paper
%T Consistent recovery threshold of hidden nearest neighbor graphs
%A Jian Ding
%A Yihong Wu
%A Jiaming Xu
%A Dana Yang
%B Proceedings of Thirty Third Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2020
%E Jacob Abernethy
%E Shivani Agarwal	
%F pmlr-v125-ding20a
%I PMLR
%P 1540--1553
%U https://proceedings.mlr.press/v125/ding20a.html
%V 125
%X  Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. The special case of Bernoulli distributions corresponds to a variant of the Watts-Strogatz small-world graph. We focus on two types of asymptotic recovery guarantees as $n\to \infty$: (1) exact recovery: all edges are classified correctly with probability tending to one; (2) almost exact recovery: the expected number of misclassified edges is $o(nk)$. We show that the maximum likelihood estimator achieves (1) exact recovery for $2 \le k \le n^{o(1)}$ if $ \liminf \frac{2\alpha_n}{\log n}>1$; (2) almost exact recovery for $ 1 \le k \le o\left( \frac{\log n}{\log \log n} \right)$ if $ \liminf \frac{kD(P_n||Q_n)}{\log n}>1, $ where $\alpha_n \triangleq -2 \log \int \sqrt{d P_n d Q_n}$ is the Rényi divergence of order $\frac{1}{2}$ and $D(P_n||Q_n)$ is the Kullback-Leibler divergence. Under mild distributional assumptions, these conditions are shown to be information-theoretically necessary for any algorithm to succeed. A key challenge in the analysis is the enumeration of $2k$-NN graphs that differ from the hidden one by a given number of edges. We also analyze several computationally efficient algorithms and provide sufficient conditions under which they achieve exact/almost exact recovery. In particular, we develop a polynomial-time algorithm that attains the threshold for exact recovery under the small-world model.

APA


Ding, J., Wu, Y., Xu, J. & Yang, D.. (2020). Consistent recovery threshold of hidden nearest neighbor graphs. Proceedings of Thirty Third Conference on Learning Theory, in Proceedings of Machine Learning Research 125:1540-1553 Available from https://proceedings.mlr.press/v125/ding20a.html.

Related Material

Download PDF