Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection

Vaggos Chatziafratis; Grigory Yaroslavtsev; Euiwoong Lee; Konstantin Makarychev; Sara Ahmadian; Alessandro Epasto; Mohammad Mahdian

Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection

Vaggos Chatziafratis, Grigory Yaroslavtsev, Euiwoong Lee, Konstantin Makarychev, Sara Ahmadian, Alessandro Epasto, Mohammad Mahdian

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108:3121-3132, 2020.

Abstract

Hierarchical Clustering is an unsupervised data analysis method which has been widely used for decades. Despite its popularity, it had an underdeveloped analytical foundation and to address this, Dasgupta recently introduced an optimization viewpoint of hierarchical clustering with pairwise similarity information that spurred a line of work shedding light on old algorithms (e.g., Average-Linkage), but also designing new algorithms. Here, for the maximization dual of Dasgupta’s objective (introduced by Moseley-Wang), we present polynomial-time 42.46% approximation algorithms that use Max-Uncut Bisection as a subroutine. The previous best worst-case approximation factor in polynomial time was 33.6%, improving only slightly over Average-Linkage which achieves 33.3%. Finally, we complement our positive results by providing APX-hardness (even for 0-1 similarities), under the Small Set Expansion hypothesis.

Cite this Paper

BibTeX


@InProceedings{pmlr-v108-chatziafratis20a,
  title = 	 {Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection},
  author =       {Chatziafratis, Vaggos and Yaroslavtsev, Grigory and Lee, Euiwoong and Makarychev, Konstantin and Ahmadian, Sara and Epasto, Alessandro and Mahdian, Mohammad},
  booktitle = 	 {Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics},
  pages = 	 {3121--3132},
  year = 	 {2020},
  editor = 	 {Chiappa, Silvia and Calandra, Roberto},
  volume = 	 {108},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {26--28 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v108/chatziafratis20a/chatziafratis20a.pdf},
  url = 	 {https://proceedings.mlr.press/v108/chatziafratis20a.html},
  abstract = 	 {Hierarchical Clustering is an unsupervised data   analysis   method   which   has   been widely used for decades.  Despite its popularity, it had an underdeveloped analytical foundation and to address this,  Dasgupta recently introduced an optimization viewpoint  of  hierarchical  clustering  with  pairwise similarity information that spurred a line  of  work  shedding  light  on  old  algorithms  (e.g.,  Average-Linkage),  but  also designing  new  algorithms.   Here,  for  the maximization  dual  of  Dasgupta’s  objective  (introduced  by  Moseley-Wang),   we present  polynomial-time  42.46%  approximation  algorithms  that  use Max-Uncut Bisection as  a  subroutine.    The  previous  best  worst-case  approximation  factor in  polynomial  time was 33.6%,  improving only  slightly  over  Average-Linkage  which achieves  33.3%.   Finally,  we  complement our  positive  results  by  providing  APX-hardness (even for 0-1 similarities),  under the Small Set Expansion hypothesis.}
}

Endnote

%0 Conference Paper
%T Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection
%A Vaggos Chatziafratis
%A Grigory Yaroslavtsev
%A Euiwoong Lee
%A Konstantin Makarychev
%A Sara Ahmadian
%A Alessandro Epasto
%A Mohammad Mahdian
%B Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2020
%E Silvia Chiappa
%E Roberto Calandra	
%F pmlr-v108-chatziafratis20a
%I PMLR
%P 3121--3132
%U https://proceedings.mlr.press/v108/chatziafratis20a.html
%V 108
%X Hierarchical Clustering is an unsupervised data   analysis   method   which   has   been widely used for decades.  Despite its popularity, it had an underdeveloped analytical foundation and to address this,  Dasgupta recently introduced an optimization viewpoint  of  hierarchical  clustering  with  pairwise similarity information that spurred a line  of  work  shedding  light  on  old  algorithms  (e.g.,  Average-Linkage),  but  also designing  new  algorithms.   Here,  for  the maximization  dual  of  Dasgupta’s  objective  (introduced  by  Moseley-Wang),   we present  polynomial-time  42.46%  approximation  algorithms  that  use Max-Uncut Bisection as  a  subroutine.    The  previous  best  worst-case  approximation  factor in  polynomial  time was 33.6%,  improving only  slightly  over  Average-Linkage  which achieves  33.3%.   Finally,  we  complement our  positive  results  by  providing  APX-hardness (even for 0-1 similarities),  under the Small Set Expansion hypothesis.

APA


Chatziafratis, V., Yaroslavtsev, G., Lee, E., Makarychev, K., Ahmadian, S., Epasto, A. & Mahdian, M.. (2020). Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 108:3121-3132 Available from https://proceedings.mlr.press/v108/chatziafratis20a.html.

Related Material

Download PDF