Hierarchical Clustering with Structural Constraints

[edit]

Vaggos Chatziafratis, Rad Niazadeh, Moses Charikar ;
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:773-782, 2018.

Abstract

Hierarchical clustering is a popular unsuperviseddata analysis method. For many real-world applications,we would like to exploit prior informationabout the data that imposes constraintson the clustering hierarchy, and is not capturedby the set of features available to the algorithm.This gives rise to the problem of hierarchicalclustering with structural constraints. Structuralconstraints pose major challenges for bottom-upapproaches like average/single linkage and eventhough they can be naturally incorporated intotop-down divisive algorithms, no formal guaranteesexist on the quality of their output. In thispaper, we provide provable approximation guaranteesfor two simple top-down algorithms, usinga recently introduced optimization viewpoint ofhierarchical clustering with pairwise similarity information(Dasgupta, 2016). We show how to findgood solutions even in the presence of conflictingprior information, by formulating a constraint-basedregularization of the objective. Furthemore, weexplore a variation of this objective for dissimilarityinformation (Cohen-Addad et al., 2018) andimprove upon current techniques. Finally, we demonstrate our approach on a real dataset for the taxonomy application.

Related Material