Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes

Or Dinari, Oren Freifeld
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:231-240, 2020.

Abstract

Adaptive clustering of grouped data is often done via the Hierarchical Dirichlet Process Mixture Model (HDPMM). That approach, however, is limited in its flexibility and usually does not scale well. As a remedy, we propose another, but closely related, hierarchical Bayesian nonparametric framework. Our main contributions are as follows. 1) a new model, called the Versatile HDPMM (vHDPMM), with two possible settings: full and reduced. While the latter is akin to the HDPMM’s setting, the former supports not only global features (as HDPMM does) but also local ones. 2) An effective mechanism for detecting global features. 3) A new sampler that addresses the challenges posed by the vHDPMM and, in the reduced setting, scales better than HDPMM samplers. 4) An efficient, distributed, and easily-modifiable implementation that offers more flexibility (even in the reduced setting) than publicly-available HDPMM implementations. Finally, we show the utility of the approach in applications such as image cosegmentation, visual topic modeling, and clustering with missing data.

Cite this Paper


BibTeX
@InProceedings{pmlr-v124-dinari20a, title = {Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes}, author = {Dinari, Or and Freifeld, Oren}, booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {231--240}, year = {2020}, editor = {Peters, Jonas and Sontag, David}, volume = {124}, series = {Proceedings of Machine Learning Research}, month = {03--06 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v124/dinari20a/dinari20a.pdf}, url = {https://proceedings.mlr.press/v124/dinari20a.html}, abstract = {Adaptive clustering of grouped data is often done via the Hierarchical Dirichlet Process Mixture Model (HDPMM). That approach, however, is limited in its flexibility and usually does not scale well. As a remedy, we propose another, but closely related, hierarchical Bayesian nonparametric framework. Our main contributions are as follows. 1) a new model, called the Versatile HDPMM (vHDPMM), with two possible settings: full and reduced. While the latter is akin to the HDPMM’s setting, the former supports not only global features (as HDPMM does) but also local ones. 2) An effective mechanism for detecting global features. 3) A new sampler that addresses the challenges posed by the vHDPMM and, in the reduced setting, scales better than HDPMM samplers. 4) An efficient, distributed, and easily-modifiable implementation that offers more flexibility (even in the reduced setting) than publicly-available HDPMM implementations. Finally, we show the utility of the approach in applications such as image cosegmentation, visual topic modeling, and clustering with missing data.} }
Endnote
%0 Conference Paper %T Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes %A Or Dinari %A Oren Freifeld %B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) %C Proceedings of Machine Learning Research %D 2020 %E Jonas Peters %E David Sontag %F pmlr-v124-dinari20a %I PMLR %P 231--240 %U https://proceedings.mlr.press/v124/dinari20a.html %V 124 %X Adaptive clustering of grouped data is often done via the Hierarchical Dirichlet Process Mixture Model (HDPMM). That approach, however, is limited in its flexibility and usually does not scale well. As a remedy, we propose another, but closely related, hierarchical Bayesian nonparametric framework. Our main contributions are as follows. 1) a new model, called the Versatile HDPMM (vHDPMM), with two possible settings: full and reduced. While the latter is akin to the HDPMM’s setting, the former supports not only global features (as HDPMM does) but also local ones. 2) An effective mechanism for detecting global features. 3) A new sampler that addresses the challenges posed by the vHDPMM and, in the reduced setting, scales better than HDPMM samplers. 4) An efficient, distributed, and easily-modifiable implementation that offers more flexibility (even in the reduced setting) than publicly-available HDPMM implementations. Finally, we show the utility of the approach in applications such as image cosegmentation, visual topic modeling, and clustering with missing data.
APA
Dinari, O. & Freifeld, O.. (2020). Scalable and Flexible Clustering of Grouped Data via Parallel and Distributed Sampling in Versatile Hierarchical Dirichlet Processes. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:231-240 Available from https://proceedings.mlr.press/v124/dinari20a.html.

Related Material