Breaking the Small Cluster Barrier of Graph Clustering

Nir Ailon; Yudong Chen; Huan Xu

Breaking the Small Cluster Barrier of Graph Clustering

Nir Ailon, Yudong Chen, Huan Xu

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):995-1003, 2013.

Abstract

This paper investigates graph clustering in the planted cluster model in the presence of \em small clusters. Traditional results dictate that for an algorithm to provably correctly recover the clusters, \em all clusters must be sufficiently large (in particular, \tildeΩ(\sqrtn) where n is the number of nodes of the graph). We show that this is not really a restriction: by a more refined analysis of the trace-norm based matrix recovery approach proposed in (Jalali et al. 2011) and (Chen et al. 2012), we prove that small clusters, under certain mild assuptions, do not hinder recovery of large ones. Based on this result, we further devise an iterative algorithm to recover \em almost all clusters via a “peeling strategy”, i.e., recover large clusters first, leading to a reduced problem, and repeat this procedure. These results are extended to the \em partial observation setting, in which only a (chosen) part of the graph is observed. The peeling strategy gives rise to an active learning algorithm, in which edges adjacent to smaller clusters are queried more often as large clusters are learned (and removed). Our findings are supported by experiments. From a high level, this paper sheds novel insights on high-dimesional statistics and learning structured data, by presenting a structured matrix learning problem for which a one shot convex relaxation approach necessarily fails, but a carefully constructed sequence of convex relaxations does the job.

Cite this Paper

BibTeX


@InProceedings{pmlr-v28-ailon13,
  title = 	 {Breaking the Small Cluster Barrier of Graph Clustering},
  author = 	 {Ailon, Nir and Chen, Yudong and Xu, Huan},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {995--1003},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/ailon13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/ailon13.html},
  abstract = 	 {This paper investigates graph clustering in the planted cluster model in the   presence of  \em small clusters. Traditional results dictate that for an   algorithm to provably correctly recover the clusters, \em all clusters must be   sufficiently large (in particular, \tildeΩ(\sqrtn) where n is the number   of nodes of the graph). We show that this is not really a restriction: by a more refined   analysis of the trace-norm based matrix recovery approach proposed in (Jalali et al. 2011) and (Chen et al. 2012), we prove that small clusters, under certain mild assuptions, do not hinder recovery of large ones.  Based on this result, we further devise an iterative algorithm   to recover \em almost all clusters via a “peeling strategy”, i.e., recover large clusters   first, leading to a reduced problem, and repeat this procedure.   These results are extended to the    \em partial observation setting, in which only a (chosen) part of the graph is observed.  The peeling strategy gives rise to an active learning algorithm, in which   edges adjacent to smaller clusters are queried more often as large clusters are learned  (and removed).  Our findings are supported by experiments.    From a high level, this paper sheds novel insights on high-dimesional statistics and   learning structured data, by presenting a structured matrix learning problem for which  a one shot convex relaxation approach necessarily fails, but a carefully constructed sequence of convex relaxations  does the job.}
}

Endnote

%0 Conference Paper
%T Breaking the Small Cluster Barrier of Graph Clustering
%A Nir Ailon
%A Yudong Chen
%A Huan Xu
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-ailon13
%I PMLR
%P 995--1003
%U https://proceedings.mlr.press/v28/ailon13.html
%V 28
%N 3
%X This paper investigates graph clustering in the planted cluster model in the   presence of  \em small clusters. Traditional results dictate that for an   algorithm to provably correctly recover the clusters, \em all clusters must be   sufficiently large (in particular, \tildeΩ(\sqrtn) where n is the number   of nodes of the graph). We show that this is not really a restriction: by a more refined   analysis of the trace-norm based matrix recovery approach proposed in (Jalali et al. 2011) and (Chen et al. 2012), we prove that small clusters, under certain mild assuptions, do not hinder recovery of large ones.  Based on this result, we further devise an iterative algorithm   to recover \em almost all clusters via a “peeling strategy”, i.e., recover large clusters   first, leading to a reduced problem, and repeat this procedure.   These results are extended to the    \em partial observation setting, in which only a (chosen) part of the graph is observed.  The peeling strategy gives rise to an active learning algorithm, in which   edges adjacent to smaller clusters are queried more often as large clusters are learned  (and removed).  Our findings are supported by experiments.    From a high level, this paper sheds novel insights on high-dimesional statistics and   learning structured data, by presenting a structured matrix learning problem for which  a one shot convex relaxation approach necessarily fails, but a carefully constructed sequence of convex relaxations  does the job.

RIS


TY  - CPAPER
TI  - Breaking the Small Cluster Barrier of Graph Clustering
AU  - Nir Ailon
AU  - Yudong Chen
AU  - Huan Xu
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-ailon13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 995
EP  - 1003
L1  - http://proceedings.mlr.press/v28/ailon13.pdf
UR  - https://proceedings.mlr.press/v28/ailon13.html
AB  - This paper investigates graph clustering in the planted cluster model in the   presence of  \em small clusters. Traditional results dictate that for an   algorithm to provably correctly recover the clusters, \em all clusters must be   sufficiently large (in particular, \tildeΩ(\sqrtn) where n is the number   of nodes of the graph). We show that this is not really a restriction: by a more refined   analysis of the trace-norm based matrix recovery approach proposed in (Jalali et al. 2011) and (Chen et al. 2012), we prove that small clusters, under certain mild assuptions, do not hinder recovery of large ones.  Based on this result, we further devise an iterative algorithm   to recover \em almost all clusters via a “peeling strategy”, i.e., recover large clusters   first, leading to a reduced problem, and repeat this procedure.   These results are extended to the    \em partial observation setting, in which only a (chosen) part of the graph is observed.  The peeling strategy gives rise to an active learning algorithm, in which   edges adjacent to smaller clusters are queried more often as large clusters are learned  (and removed).  Our findings are supported by experiments.    From a high level, this paper sheds novel insights on high-dimesional statistics and   learning structured data, by presenting a structured matrix learning problem for which  a one shot convex relaxation approach necessarily fails, but a carefully constructed sequence of convex relaxations  does the job.
ER  -

APA


Ailon, N., Chen, Y. & Xu, H.. (2013). Breaking the Small Cluster Barrier of Graph Clustering. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):995-1003 Available from https://proceedings.mlr.press/v28/ailon13.html.

Breaking the Small Cluster Barrier of Graph Clustering

Abstract

Cite this Paper

Related Material