Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting

Yudong Chen, Jiaming Xu
Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2):244-252, 2014.

Abstract

The planted models assume that a graph is generated from some unknown clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph. Special cases include planted clique, planted partition, planted densest subgraph and planted coloring. Of particular interest is the High-Dimensional setting where the number of clusters is allowed to grow with the number of nodes. We show that the space of model parameters can be partitioned into four disjoint regions corresponding to decreasing statistical and computational complexities: (1) the impossible regime, where all algorithms fail; (2) the hard regime, where the exponential-time Maximum Likelihood Estimator (MLE) succeeds, and no polynomial-time method is known; (3) the easy regime, where the polynomial-time convexified MLE succeeds; (4) the simple regime, where a simple counting/thresholding procedure succeeds. Moreover, each of these algorithms provably fails in the previous harder regimes. Our theorems establish the first minimax recovery results for the high-dimensional setting, and provide the best known guarantees for polynomial-time algorithms. Our results extend to the related problem of submatrix localization, a.k.a. bi-clustering. These results demonstrate the tradeoffs between statistical and computational considerations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v32-chene14, title = {Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting}, author = {Chen, Yudong and Xu, Jiaming}, booktitle = {Proceedings of the 31st International Conference on Machine Learning}, pages = {244--252}, year = {2014}, editor = {Xing, Eric P. and Jebara, Tony}, volume = {32}, number = {2}, series = {Proceedings of Machine Learning Research}, address = {Bejing, China}, month = {22--24 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v32/chene14.pdf}, url = {https://proceedings.mlr.press/v32/chene14.html}, abstract = {The planted models assume that a graph is generated from some unknown clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph. Special cases include planted clique, planted partition, planted densest subgraph and planted coloring. Of particular interest is the High-Dimensional setting where the number of clusters is allowed to grow with the number of nodes. We show that the space of model parameters can be partitioned into four disjoint regions corresponding to decreasing statistical and computational complexities: (1) the impossible regime, where all algorithms fail; (2) the hard regime, where the exponential-time Maximum Likelihood Estimator (MLE) succeeds, and no polynomial-time method is known; (3) the easy regime, where the polynomial-time convexified MLE succeeds; (4) the simple regime, where a simple counting/thresholding procedure succeeds. Moreover, each of these algorithms provably fails in the previous harder regimes. Our theorems establish the first minimax recovery results for the high-dimensional setting, and provide the best known guarantees for polynomial-time algorithms. Our results extend to the related problem of submatrix localization, a.k.a. bi-clustering. These results demonstrate the tradeoffs between statistical and computational considerations.} }
Endnote
%0 Conference Paper %T Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting %A Yudong Chen %A Jiaming Xu %B Proceedings of the 31st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2014 %E Eric P. Xing %E Tony Jebara %F pmlr-v32-chene14 %I PMLR %P 244--252 %U https://proceedings.mlr.press/v32/chene14.html %V 32 %N 2 %X The planted models assume that a graph is generated from some unknown clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph. Special cases include planted clique, planted partition, planted densest subgraph and planted coloring. Of particular interest is the High-Dimensional setting where the number of clusters is allowed to grow with the number of nodes. We show that the space of model parameters can be partitioned into four disjoint regions corresponding to decreasing statistical and computational complexities: (1) the impossible regime, where all algorithms fail; (2) the hard regime, where the exponential-time Maximum Likelihood Estimator (MLE) succeeds, and no polynomial-time method is known; (3) the easy regime, where the polynomial-time convexified MLE succeeds; (4) the simple regime, where a simple counting/thresholding procedure succeeds. Moreover, each of these algorithms provably fails in the previous harder regimes. Our theorems establish the first minimax recovery results for the high-dimensional setting, and provide the best known guarantees for polynomial-time algorithms. Our results extend to the related problem of submatrix localization, a.k.a. bi-clustering. These results demonstrate the tradeoffs between statistical and computational considerations.
RIS
TY - CPAPER TI - Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting AU - Yudong Chen AU - Jiaming Xu BT - Proceedings of the 31st International Conference on Machine Learning DA - 2014/06/18 ED - Eric P. Xing ED - Tony Jebara ID - pmlr-v32-chene14 PB - PMLR DP - Proceedings of Machine Learning Research VL - 32 IS - 2 SP - 244 EP - 252 L1 - http://proceedings.mlr.press/v32/chene14.pdf UR - https://proceedings.mlr.press/v32/chene14.html AB - The planted models assume that a graph is generated from some unknown clusters by randomly placing edges between nodes according to their cluster memberships; the task is to recover the clusters given the graph. Special cases include planted clique, planted partition, planted densest subgraph and planted coloring. Of particular interest is the High-Dimensional setting where the number of clusters is allowed to grow with the number of nodes. We show that the space of model parameters can be partitioned into four disjoint regions corresponding to decreasing statistical and computational complexities: (1) the impossible regime, where all algorithms fail; (2) the hard regime, where the exponential-time Maximum Likelihood Estimator (MLE) succeeds, and no polynomial-time method is known; (3) the easy regime, where the polynomial-time convexified MLE succeeds; (4) the simple regime, where a simple counting/thresholding procedure succeeds. Moreover, each of these algorithms provably fails in the previous harder regimes. Our theorems establish the first minimax recovery results for the high-dimensional setting, and provide the best known guarantees for polynomial-time algorithms. Our results extend to the related problem of submatrix localization, a.k.a. bi-clustering. These results demonstrate the tradeoffs between statistical and computational considerations. ER -
APA
Chen, Y. & Xu, J.. (2014). Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting. Proceedings of the 31st International Conference on Machine Learning, in Proceedings of Machine Learning Research 32(2):244-252 Available from https://proceedings.mlr.press/v32/chene14.html.

Related Material