Semidefinite Programs for Exact Recovery of a Hidden Community

Bruce Hajek; Yihong Wu; Jiaming Xu

Semidefinite Programs for Exact Recovery of a Hidden Community

Bruce Hajek, Yihong Wu, Jiaming Xu

29th Annual Conference on Learning Theory, PMLR 49:1051-1095, 2016.

Abstract

We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n \times n symmetric data matrix A, where for distinct indices i,j, A_ij ∼P if i, j are both in the community and A_ij ∼Q otherwise, for two known probability distributions P and Q. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P=\rm Bern(p) and Q=\rm Bern(q) with p>q) and the Gaussian case (P=\mathcalN(μ,1) and Q=\mathcalN(0,1) with μ>0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K=ω( n /\log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K=Θ(n/\log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K=o(n/\log n) and K \to ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

Cite this Paper

BibTeX


@InProceedings{pmlr-v49-hajek16,
  title = 	 {Semidefinite Programs for Exact Recovery of a Hidden Community},
  author = 	 {Hajek, Bruce and Wu, Yihong and Xu, Jiaming},
  booktitle = 	 {29th Annual Conference on Learning Theory},
  pages = 	 {1051--1095},
  year = 	 {2016},
  editor = 	 {Feldman, Vitaly and Rakhlin, Alexander and Shamir, Ohad},
  volume = 	 {49},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Columbia University, New York, New York, USA},
  month = 	 {23--26 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v49/hajek16.pdf},
  url = 	 {https://proceedings.mlr.press/v49/hajek16.html},
  abstract = 	 {We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n \times n symmetric data matrix A, where for distinct indices i,j, A_ij ∼P if i, j are both in the community and A_ij ∼Q otherwise, for two known probability distributions P and Q. We identify a  sufficient condition  and a  necessary condition for the success of SDP  for the general model. For both the Bernoulli case (P=\rm Bern(p) and Q=\rm Bern(q) with p>q) and the Gaussian case (P=\mathcalN(μ,1) and Q=\mathcalN(0,1) with μ>0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K=ω( n /\log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K=Θ(n/\log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K=o(n/\log n) and K \to ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix. }
}

Endnote

%0 Conference Paper
%T Semidefinite Programs for Exact Recovery of a Hidden Community
%A Bruce Hajek
%A Yihong Wu
%A Jiaming Xu
%B 29th Annual Conference on Learning Theory
%C Proceedings of Machine Learning Research
%D 2016
%E Vitaly Feldman
%E Alexander Rakhlin
%E Ohad Shamir	
%F pmlr-v49-hajek16
%I PMLR
%P 1051--1095
%U https://proceedings.mlr.press/v49/hajek16.html
%V 49
%X We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n \times n symmetric data matrix A, where for distinct indices i,j, A_ij ∼P if i, j are both in the community and A_ij ∼Q otherwise, for two known probability distributions P and Q. We identify a  sufficient condition  and a  necessary condition for the success of SDP  for the general model. For both the Bernoulli case (P=\rm Bern(p) and Q=\rm Bern(q) with p>q) and the Gaussian case (P=\mathcalN(μ,1) and Q=\mathcalN(0,1) with μ>0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K=ω( n /\log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K=Θ(n/\log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K=o(n/\log n) and K \to ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

RIS


TY  - CPAPER
TI  - Semidefinite Programs for Exact Recovery of a Hidden Community
AU  - Bruce Hajek
AU  - Yihong Wu
AU  - Jiaming Xu
BT  - 29th Annual Conference on Learning Theory
DA  - 2016/06/06
ED  - Vitaly Feldman
ED  - Alexander Rakhlin
ED  - Ohad Shamir	
ID  - pmlr-v49-hajek16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 49
SP  - 1051
EP  - 1095
L1  - http://proceedings.mlr.press/v49/hajek16.pdf
UR  - https://proceedings.mlr.press/v49/hajek16.html
AB  - We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality K from an n \times n symmetric data matrix A, where for distinct indices i,j, A_ij ∼P if i, j are both in the community and A_ij ∼Q otherwise, for two known probability distributions P and Q. We identify a  sufficient condition  and a  necessary condition for the success of SDP  for the general model. For both the Bernoulli case (P=\rm Bern(p) and Q=\rm Bern(q) with p>q) and the Gaussian case (P=\mathcalN(μ,1) and Q=\mathcalN(0,1) with μ>0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K=ω( n /\log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K=Θ(n/\log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K=o(n/\log n) and K \to ∞, SDP is order-wise suboptimal. The same critical scaling for K is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of n vertices partitioned into multiple communities of equal size K. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix. 
ER  -

APA


Hajek, B., Wu, Y. & Xu, J.. (2016). Semidefinite Programs for Exact Recovery of a Hidden Community. 29th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 49:1051-1095 Available from https://proceedings.mlr.press/v49/hajek16.html.

Related Material

Download PDF