High-dimensional Clustering onto Hamiltonian Cycle

Tianyi Huang, Shenghui Cheng, Stan Z. Li, Zhengjun Zhang
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:13791-13813, 2023.

Abstract

Clustering aims to group unlabelled samples based on their similarities and is widespread in high-dimensional data analysis. However, most of the clustering methods merely generate pseudo labels and thus are unable to simultaneously present the similarities between different clusters and outliers. This paper proposes a new framework called High-dimensional Clustering onto Hamiltonian Cycle (HCHC) to solve the above problems. First, HCHC combines global structure with local structure in one objective function for deep clustering, improving the labels as relative probabilities, to mine the similarities between different clusters while keeping the local structure in each cluster. Then, the anchors of different clusters are sorted on the optimal Hamiltonian cycle generated by the cluster similarities and mapped on the circumference of a circle. Finally, a sample with a higher probability of a cluster will be mapped closer to the corresponding anchor. In this way, our framework allows us to appreciate three aspects visually and simultaneously - clusters (formed by samples with high probabilities), cluster similarities (represented as circular distances), and outliers (recognized as dots far away from all clusters). The theoretical analysis and experiments illustrate the superiority of HCHC.

Cite this Paper


BibTeX
@InProceedings{pmlr-v202-huang23d, title = {High-dimensional Clustering onto {H}amiltonian Cycle}, author = {Huang, Tianyi and Cheng, Shenghui and Li, Stan Z. and Zhang, Zhengjun}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, pages = {13791--13813}, year = {2023}, editor = {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan}, volume = {202}, series = {Proceedings of Machine Learning Research}, month = {23--29 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v202/huang23d/huang23d.pdf}, url = {https://proceedings.mlr.press/v202/huang23d.html}, abstract = {Clustering aims to group unlabelled samples based on their similarities and is widespread in high-dimensional data analysis. However, most of the clustering methods merely generate pseudo labels and thus are unable to simultaneously present the similarities between different clusters and outliers. This paper proposes a new framework called High-dimensional Clustering onto Hamiltonian Cycle (HCHC) to solve the above problems. First, HCHC combines global structure with local structure in one objective function for deep clustering, improving the labels as relative probabilities, to mine the similarities between different clusters while keeping the local structure in each cluster. Then, the anchors of different clusters are sorted on the optimal Hamiltonian cycle generated by the cluster similarities and mapped on the circumference of a circle. Finally, a sample with a higher probability of a cluster will be mapped closer to the corresponding anchor. In this way, our framework allows us to appreciate three aspects visually and simultaneously - clusters (formed by samples with high probabilities), cluster similarities (represented as circular distances), and outliers (recognized as dots far away from all clusters). The theoretical analysis and experiments illustrate the superiority of HCHC.} }
Endnote
%0 Conference Paper %T High-dimensional Clustering onto Hamiltonian Cycle %A Tianyi Huang %A Shenghui Cheng %A Stan Z. Li %A Zhengjun Zhang %B Proceedings of the 40th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2023 %E Andreas Krause %E Emma Brunskill %E Kyunghyun Cho %E Barbara Engelhardt %E Sivan Sabato %E Jonathan Scarlett %F pmlr-v202-huang23d %I PMLR %P 13791--13813 %U https://proceedings.mlr.press/v202/huang23d.html %V 202 %X Clustering aims to group unlabelled samples based on their similarities and is widespread in high-dimensional data analysis. However, most of the clustering methods merely generate pseudo labels and thus are unable to simultaneously present the similarities between different clusters and outliers. This paper proposes a new framework called High-dimensional Clustering onto Hamiltonian Cycle (HCHC) to solve the above problems. First, HCHC combines global structure with local structure in one objective function for deep clustering, improving the labels as relative probabilities, to mine the similarities between different clusters while keeping the local structure in each cluster. Then, the anchors of different clusters are sorted on the optimal Hamiltonian cycle generated by the cluster similarities and mapped on the circumference of a circle. Finally, a sample with a higher probability of a cluster will be mapped closer to the corresponding anchor. In this way, our framework allows us to appreciate three aspects visually and simultaneously - clusters (formed by samples with high probabilities), cluster similarities (represented as circular distances), and outliers (recognized as dots far away from all clusters). The theoretical analysis and experiments illustrate the superiority of HCHC.
APA
Huang, T., Cheng, S., Li, S.Z. & Zhang, Z.. (2023). High-dimensional Clustering onto Hamiltonian Cycle. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:13791-13813 Available from https://proceedings.mlr.press/v202/huang23d.html.

Related Material