Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility

Juho Lee, Seungjin Choi
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:581-589, 2015.

Abstract

Bayesian hierarchical clustering (BHC) is an agglomerative clustering method, where a probabilistic model is defined and its marginal likelihoods are evaluated to decide which clusters to merge. While BHC provides a few advantages over traditional distance-based agglomerative clustering algorithms, successive evaluation of marginal likelihoods and careful hyperparameter tuning are cumbersome and limit the scalability. In this paper we relax BHC into a non-probabilistic formulation, exploring smallvariance asymptotics in conjugate-exponential models. We develop a novel clustering algorithm, referred to as relaxed BHC (RBHC), from the asymptotic limit of the BHC model that exhibits the scalability of distance-based agglomerative clustering algorithms as well as the flexibility of Bayesian nonparametric models. We also investigate the reducibility of the dissimilarity measure emerged from the asymptotic limit of the BHC model, allowing us to use scalable algorithms such as the nearest neighbor chain algorithm. Numerical experiments on both synthetic and real-world datasets demonstrate the validity and high performance of our method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v38-lee15c, title = {{Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility}}, author = {Juho Lee and Seungjin Choi}, booktitle = {Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics}, pages = {581--589}, year = {2015}, editor = {Guy Lebanon and S. V. N. Vishwanathan}, volume = {38}, series = {Proceedings of Machine Learning Research}, address = {San Diego, California, USA}, month = {09--12 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v38/lee15c.pdf}, url = { http://proceedings.mlr.press/v38/lee15c.html }, abstract = {Bayesian hierarchical clustering (BHC) is an agglomerative clustering method, where a probabilistic model is defined and its marginal likelihoods are evaluated to decide which clusters to merge. While BHC provides a few advantages over traditional distance-based agglomerative clustering algorithms, successive evaluation of marginal likelihoods and careful hyperparameter tuning are cumbersome and limit the scalability. In this paper we relax BHC into a non-probabilistic formulation, exploring smallvariance asymptotics in conjugate-exponential models. We develop a novel clustering algorithm, referred to as relaxed BHC (RBHC), from the asymptotic limit of the BHC model that exhibits the scalability of distance-based agglomerative clustering algorithms as well as the flexibility of Bayesian nonparametric models. We also investigate the reducibility of the dissimilarity measure emerged from the asymptotic limit of the BHC model, allowing us to use scalable algorithms such as the nearest neighbor chain algorithm. Numerical experiments on both synthetic and real-world datasets demonstrate the validity and high performance of our method.} }
Endnote
%0 Conference Paper %T Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility %A Juho Lee %A Seungjin Choi %B Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2015 %E Guy Lebanon %E S. V. N. Vishwanathan %F pmlr-v38-lee15c %I PMLR %P 581--589 %U http://proceedings.mlr.press/v38/lee15c.html %V 38 %X Bayesian hierarchical clustering (BHC) is an agglomerative clustering method, where a probabilistic model is defined and its marginal likelihoods are evaluated to decide which clusters to merge. While BHC provides a few advantages over traditional distance-based agglomerative clustering algorithms, successive evaluation of marginal likelihoods and careful hyperparameter tuning are cumbersome and limit the scalability. In this paper we relax BHC into a non-probabilistic formulation, exploring smallvariance asymptotics in conjugate-exponential models. We develop a novel clustering algorithm, referred to as relaxed BHC (RBHC), from the asymptotic limit of the BHC model that exhibits the scalability of distance-based agglomerative clustering algorithms as well as the flexibility of Bayesian nonparametric models. We also investigate the reducibility of the dissimilarity measure emerged from the asymptotic limit of the BHC model, allowing us to use scalable algorithms such as the nearest neighbor chain algorithm. Numerical experiments on both synthetic and real-world datasets demonstrate the validity and high performance of our method.
RIS
TY - CPAPER TI - Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility AU - Juho Lee AU - Seungjin Choi BT - Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics DA - 2015/02/21 ED - Guy Lebanon ED - S. V. N. Vishwanathan ID - pmlr-v38-lee15c PB - PMLR DP - Proceedings of Machine Learning Research VL - 38 SP - 581 EP - 589 L1 - http://proceedings.mlr.press/v38/lee15c.pdf UR - http://proceedings.mlr.press/v38/lee15c.html AB - Bayesian hierarchical clustering (BHC) is an agglomerative clustering method, where a probabilistic model is defined and its marginal likelihoods are evaluated to decide which clusters to merge. While BHC provides a few advantages over traditional distance-based agglomerative clustering algorithms, successive evaluation of marginal likelihoods and careful hyperparameter tuning are cumbersome and limit the scalability. In this paper we relax BHC into a non-probabilistic formulation, exploring smallvariance asymptotics in conjugate-exponential models. We develop a novel clustering algorithm, referred to as relaxed BHC (RBHC), from the asymptotic limit of the BHC model that exhibits the scalability of distance-based agglomerative clustering algorithms as well as the flexibility of Bayesian nonparametric models. We also investigate the reducibility of the dissimilarity measure emerged from the asymptotic limit of the BHC model, allowing us to use scalable algorithms such as the nearest neighbor chain algorithm. Numerical experiments on both synthetic and real-world datasets demonstrate the validity and high performance of our method. ER -
APA
Lee, J. & Choi, S.. (2015). Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 38:581-589 Available from http://proceedings.mlr.press/v38/lee15c.html .

Related Material