Dynamic visualization for L1 fusion convex clustering in near-linear time

Bingyuan Zhang, Jie Chen, Yoshikazu Terada
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, PMLR 161:515-524, 2021.

Abstract

Convex clustering has drawn recent attention because of its competitive performance and nice property to guarantee global optimality. However, convex clustering is infeasible due to its high computational cost for large-scale data sets. We propose a novel method to solve the L1 fusion convex clustering problem by dynamic programming. We develop the Convex clustering Path Algorithm In Near-linear Time (C-PAINT) algorithm to construct the solution path efficiently. The proposed C-PAINT yields the exact solution while other general solvers for convex problems applied in the convex clustering depend on tuning parameters such as step size and threshold, and it usually takes many iterations to converge. Including a sorting process that almost takes no time in practice, the main part of the algorithm takes only linear time. Thus, C-PAINT has superior scalability comparing to other state-of-art algorithms. Moreover, C-PAINT enables the path visualization of clustering solutions for large data. In particular, experiments show our proposed method can solve the convex clustering with 10^7 data points in two minutes. We demonstrate the proposed method using both synthetic data and real data. Our algorithms are implemented in the dpcc R package.

Cite this Paper


BibTeX
@InProceedings{pmlr-v161-zhang21d, title = {Dynamic visualization for L1 fusion convex clustering in near-linear time}, author = {Zhang, Bingyuan and Chen, Jie and Terada, Yoshikazu}, booktitle = {Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence}, pages = {515--524}, year = {2021}, editor = {de Campos, Cassio and Maathuis, Marloes H.}, volume = {161}, series = {Proceedings of Machine Learning Research}, month = {27--30 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v161/zhang21d/zhang21d.pdf}, url = {https://proceedings.mlr.press/v161/zhang21d.html}, abstract = {Convex clustering has drawn recent attention because of its competitive performance and nice property to guarantee global optimality. However, convex clustering is infeasible due to its high computational cost for large-scale data sets. We propose a novel method to solve the L1 fusion convex clustering problem by dynamic programming. We develop the Convex clustering Path Algorithm In Near-linear Time (C-PAINT) algorithm to construct the solution path efficiently. The proposed C-PAINT yields the exact solution while other general solvers for convex problems applied in the convex clustering depend on tuning parameters such as step size and threshold, and it usually takes many iterations to converge. Including a sorting process that almost takes no time in practice, the main part of the algorithm takes only linear time. Thus, C-PAINT has superior scalability comparing to other state-of-art algorithms. Moreover, C-PAINT enables the path visualization of clustering solutions for large data. In particular, experiments show our proposed method can solve the convex clustering with 10^7 data points in two minutes. We demonstrate the proposed method using both synthetic data and real data. Our algorithms are implemented in the dpcc R package.} }
Endnote
%0 Conference Paper %T Dynamic visualization for L1 fusion convex clustering in near-linear time %A Bingyuan Zhang %A Jie Chen %A Yoshikazu Terada %B Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2021 %E Cassio de Campos %E Marloes H. Maathuis %F pmlr-v161-zhang21d %I PMLR %P 515--524 %U https://proceedings.mlr.press/v161/zhang21d.html %V 161 %X Convex clustering has drawn recent attention because of its competitive performance and nice property to guarantee global optimality. However, convex clustering is infeasible due to its high computational cost for large-scale data sets. We propose a novel method to solve the L1 fusion convex clustering problem by dynamic programming. We develop the Convex clustering Path Algorithm In Near-linear Time (C-PAINT) algorithm to construct the solution path efficiently. The proposed C-PAINT yields the exact solution while other general solvers for convex problems applied in the convex clustering depend on tuning parameters such as step size and threshold, and it usually takes many iterations to converge. Including a sorting process that almost takes no time in practice, the main part of the algorithm takes only linear time. Thus, C-PAINT has superior scalability comparing to other state-of-art algorithms. Moreover, C-PAINT enables the path visualization of clustering solutions for large data. In particular, experiments show our proposed method can solve the convex clustering with 10^7 data points in two minutes. We demonstrate the proposed method using both synthetic data and real data. Our algorithms are implemented in the dpcc R package.
APA
Zhang, B., Chen, J. & Terada, Y.. (2021). Dynamic visualization for L1 fusion convex clustering in near-linear time. Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 161:515-524 Available from https://proceedings.mlr.press/v161/zhang21d.html.

Related Material