Clustering High Dimensional Categorical Data via Topographical Features

Chao Chen, Novi Quadrianto
Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2732-2740, 2016.

Abstract

Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.

Cite this Paper


BibTeX
@InProceedings{pmlr-v48-chenc16, title = {Clustering High Dimensional Categorical Data via Topographical Features}, author = {Chen, Chao and Quadrianto, Novi}, booktitle = {Proceedings of The 33rd International Conference on Machine Learning}, pages = {2732--2740}, year = {2016}, editor = {Balcan, Maria Florina and Weinberger, Kilian Q.}, volume = {48}, series = {Proceedings of Machine Learning Research}, address = {New York, New York, USA}, month = {20--22 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v48/chenc16.pdf}, url = {https://proceedings.mlr.press/v48/chenc16.html}, abstract = {Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.} }
Endnote
%0 Conference Paper %T Clustering High Dimensional Categorical Data via Topographical Features %A Chao Chen %A Novi Quadrianto %B Proceedings of The 33rd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2016 %E Maria Florina Balcan %E Kilian Q. Weinberger %F pmlr-v48-chenc16 %I PMLR %P 2732--2740 %U https://proceedings.mlr.press/v48/chenc16.html %V 48 %X Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.
RIS
TY - CPAPER TI - Clustering High Dimensional Categorical Data via Topographical Features AU - Chao Chen AU - Novi Quadrianto BT - Proceedings of The 33rd International Conference on Machine Learning DA - 2016/06/11 ED - Maria Florina Balcan ED - Kilian Q. Weinberger ID - pmlr-v48-chenc16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 48 SP - 2732 EP - 2740 L1 - http://proceedings.mlr.press/v48/chenc16.pdf UR - https://proceedings.mlr.press/v48/chenc16.html AB - Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property. ER -
APA
Chen, C. & Quadrianto, N.. (2016). Clustering High Dimensional Categorical Data via Topographical Features. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:2732-2740 Available from https://proceedings.mlr.press/v48/chenc16.html.

Related Material