Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:2732-2740, 2016.
Abstract
Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.
@InProceedings{pmlr-v48-chenc16,
title = {Clustering High Dimensional Categorical Data via Topographical Features},
author = {Chao Chen and Novi Quadrianto},
booktitle = {Proceedings of The 33rd International Conference on Machine Learning},
pages = {2732--2740},
year = {2016},
editor = {Maria Florina Balcan and Kilian Q. Weinberger},
volume = {48},
series = {Proceedings of Machine Learning Research},
address = {New York, New York, USA},
month = {20--22 Jun},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v48/chenc16.pdf},
url = {http://proceedings.mlr.press/v48/chenc16.html},
abstract = {Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.}
}
%0 Conference Paper
%T Clustering High Dimensional Categorical Data via Topographical Features
%A Chao Chen
%A Novi Quadrianto
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger
%F pmlr-v48-chenc16
%I PMLR
%J Proceedings of Machine Learning Research
%P 2732--2740
%U http://proceedings.mlr.press
%V 48
%W PMLR
%X Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.
TY - CPAPER
TI - Clustering High Dimensional Categorical Data via Topographical Features
AU - Chao Chen
AU - Novi Quadrianto
BT - Proceedings of The 33rd International Conference on Machine Learning
PY - 2016/06/11
DA - 2016/06/11
ED - Maria Florina Balcan
ED - Kilian Q. Weinberger
ID - pmlr-v48-chenc16
PB - PMLR
SP - 2732
DP - PMLR
EP - 2740
L1 - http://proceedings.mlr.press/v48/chenc16.pdf
UR - http://proceedings.mlr.press/v48/chenc16.html
AB - Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.
ER -
Chen, C. & Quadrianto, N.. (2016). Clustering High Dimensional Categorical Data via Topographical Features. Proceedings of The 33rd International Conference on Machine Learning, in PMLR 48:2732-2740
This site last compiled Sat, 04 Nov 2017 20:59:32 +0000