Online Learning of a Dirichlet Process Mixture of Generalized Dirichlet Distributions for Simultaneous Clustering and Localized Feature Selection

Wentao Fan, Nizar Bouguila
Proceedings of the Asian Conference on Machine Learning, PMLR 25:113-128, 2012.

Abstract

Online algorithms allow data instances to be processed in a sequential way, which is important for large-scale and real-time applications. In this paper, we propose a novel online clustering approach based on a Dirichlet process mixture of generalized Dirichlet (GD) distributions, which can be considered as an extension of the finite GD mixture model to the infinite case. Our approach is built on nonparametric Bayesian analysis where the determination of the number of clusters is sidestepped by assuming an infinite number of mixture components. Moreover, an unsupervised localized feature selection scheme is integrated with the proposed nonparametric framework to improve the clustering performance. By learning the proposed model in an online manner using a variational approach, all the involved parameters and features saliencies are estimated simultaneously and effectively in closed forms. The proposed online infinite mixture model is validated through both synthetic data sets and two challenging real-world applications namely text document clustering and online human face detection.

Cite this Paper


BibTeX
@InProceedings{pmlr-v25-fan12, title = {Online Learning of a {D}irichlet Process Mixture of Generalized {D}irichlet Distributions for Simultaneous Clustering and Localized Feature Selection}, author = {Fan, Wentao and Bouguila, Nizar}, booktitle = {Proceedings of the Asian Conference on Machine Learning}, pages = {113--128}, year = {2012}, editor = {Hoi, Steven C. H. and Buntine, Wray}, volume = {25}, series = {Proceedings of Machine Learning Research}, address = {Singapore Management University, Singapore}, month = {04--06 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v25/fan12/fan12.pdf}, url = {https://proceedings.mlr.press/v25/fan12.html}, abstract = {Online algorithms allow data instances to be processed in a sequential way, which is important for large-scale and real-time applications. In this paper, we propose a novel online clustering approach based on a Dirichlet process mixture of generalized Dirichlet (GD) distributions, which can be considered as an extension of the finite GD mixture model to the infinite case. Our approach is built on nonparametric Bayesian analysis where the determination of the number of clusters is sidestepped by assuming an infinite number of mixture components. Moreover, an unsupervised localized feature selection scheme is integrated with the proposed nonparametric framework to improve the clustering performance. By learning the proposed model in an online manner using a variational approach, all the involved parameters and features saliencies are estimated simultaneously and effectively in closed forms. The proposed online infinite mixture model is validated through both synthetic data sets and two challenging real-world applications namely text document clustering and online human face detection.} }
Endnote
%0 Conference Paper %T Online Learning of a Dirichlet Process Mixture of Generalized Dirichlet Distributions for Simultaneous Clustering and Localized Feature Selection %A Wentao Fan %A Nizar Bouguila %B Proceedings of the Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2012 %E Steven C. H. Hoi %E Wray Buntine %F pmlr-v25-fan12 %I PMLR %P 113--128 %U https://proceedings.mlr.press/v25/fan12.html %V 25 %X Online algorithms allow data instances to be processed in a sequential way, which is important for large-scale and real-time applications. In this paper, we propose a novel online clustering approach based on a Dirichlet process mixture of generalized Dirichlet (GD) distributions, which can be considered as an extension of the finite GD mixture model to the infinite case. Our approach is built on nonparametric Bayesian analysis where the determination of the number of clusters is sidestepped by assuming an infinite number of mixture components. Moreover, an unsupervised localized feature selection scheme is integrated with the proposed nonparametric framework to improve the clustering performance. By learning the proposed model in an online manner using a variational approach, all the involved parameters and features saliencies are estimated simultaneously and effectively in closed forms. The proposed online infinite mixture model is validated through both synthetic data sets and two challenging real-world applications namely text document clustering and online human face detection.
RIS
TY - CPAPER TI - Online Learning of a Dirichlet Process Mixture of Generalized Dirichlet Distributions for Simultaneous Clustering and Localized Feature Selection AU - Wentao Fan AU - Nizar Bouguila BT - Proceedings of the Asian Conference on Machine Learning DA - 2012/11/17 ED - Steven C. H. Hoi ED - Wray Buntine ID - pmlr-v25-fan12 PB - PMLR DP - Proceedings of Machine Learning Research VL - 25 SP - 113 EP - 128 L1 - http://proceedings.mlr.press/v25/fan12/fan12.pdf UR - https://proceedings.mlr.press/v25/fan12.html AB - Online algorithms allow data instances to be processed in a sequential way, which is important for large-scale and real-time applications. In this paper, we propose a novel online clustering approach based on a Dirichlet process mixture of generalized Dirichlet (GD) distributions, which can be considered as an extension of the finite GD mixture model to the infinite case. Our approach is built on nonparametric Bayesian analysis where the determination of the number of clusters is sidestepped by assuming an infinite number of mixture components. Moreover, an unsupervised localized feature selection scheme is integrated with the proposed nonparametric framework to improve the clustering performance. By learning the proposed model in an online manner using a variational approach, all the involved parameters and features saliencies are estimated simultaneously and effectively in closed forms. The proposed online infinite mixture model is validated through both synthetic data sets and two challenging real-world applications namely text document clustering and online human face detection. ER -
APA
Fan, W. & Bouguila, N.. (2012). Online Learning of a Dirichlet Process Mixture of Generalized Dirichlet Distributions for Simultaneous Clustering and Localized Feature Selection. Proceedings of the Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 25:113-128 Available from https://proceedings.mlr.press/v25/fan12.html.

Related Material