Clustering High-dimensional Data with Ordered Weighted $\ell_1$ Regularization

Chandramauli Chakraborty; Sayan Paul; Saptarshi Chakraborty; Swagatam Das

Clustering High-dimensional Data with Ordered Weighted $\ell_1$ Regularization

Chandramauli Chakraborty, Sayan Paul, Saptarshi Chakraborty, Swagatam Das

Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:7176-7189, 2023.

Abstract

Clustering complex high-dimensional data is particularly challenging as the signal-to-noise ratio in such data is significantly lower than their classical counterparts. This is mainly because most of the features describing a data point have little to no information about the natural grouping of the data. Filtering such features is, thus, critical in harnessing meaningful information from such large-scale data. Many recent methods have attempted to find feature importance in a centroid-based clustering setting. Though empirically successful in classical low-dimensional settings, most perform poorly, especially on microarray and single-cell RNA-seq data. This paper extends the merits of weighted center-based clustering through the Ordered Weighted $\ell_1$ (OWL) norm for better feature selection. Appealing to the elegant properties of block coordinate-descent and Frank-Wolf algorithms, we are not only able to maintain computational efficiency but also able to outperform the state-of-the-art in high-dimensional settings. The proposal also comes with finite sample theoretical guarantees, including a rate of $\mathcal{O}\left(\sqrt{k \log p/n}\right)$, under model-sparsity, bridging the gap between theory and practice of weighted clustering.

Cite this Paper

BibTeX

@InProceedings{pmlr-v206-chakraborty23a,
  title = 	 {Clustering High-dimensional Data with Ordered Weighted $\ell_1$ Regularization},
  author =       {Chakraborty, Chandramauli and Paul, Sayan and Chakraborty, Saptarshi and Das, Swagatam},
  booktitle = 	 {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {7176--7189},
  year = 	 {2023},
  editor = 	 {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem},
  volume = 	 {206},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {25--27 Apr},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v206/chakraborty23a/chakraborty23a.pdf},
  url = 	 {https://proceedings.mlr.press/v206/chakraborty23a.html},
  abstract = 	 {Clustering complex high-dimensional data is particularly challenging as the signal-to-noise ratio in such data is significantly lower than their classical counterparts. This is mainly because most of the features describing a data point have little to no information about the natural grouping of the data. Filtering such features is, thus, critical in harnessing meaningful information from such large-scale data. Many recent methods have attempted to find feature importance in a centroid-based clustering setting. Though empirically successful in classical low-dimensional settings, most perform poorly, especially on microarray and single-cell RNA-seq data. This paper extends the merits of weighted center-based clustering through the Ordered Weighted $\ell_1$ (OWL) norm for better feature selection. Appealing to the elegant properties of block coordinate-descent and Frank-Wolf algorithms, we are not only able to maintain computational efficiency but also able to outperform the state-of-the-art in high-dimensional settings. The proposal also comes with finite sample theoretical guarantees, including a rate of $\mathcal{O}\left(\sqrt{k \log p/n}\right)$, under model-sparsity, bridging the gap between theory and practice of weighted clustering.}
}

Endnote

%0 Conference Paper
%T Clustering High-dimensional Data with Ordered Weighted $\ell_1$ Regularization
%A Chandramauli Chakraborty
%A Sayan Paul
%A Saptarshi Chakraborty
%A Swagatam Das
%B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2023
%E Francisco Ruiz
%E Jennifer Dy
%E Jan-Willem van de Meent	
%F pmlr-v206-chakraborty23a
%I PMLR
%P 7176--7189
%U https://proceedings.mlr.press/v206/chakraborty23a.html
%V 206
%X Clustering complex high-dimensional data is particularly challenging as the signal-to-noise ratio in such data is significantly lower than their classical counterparts. This is mainly because most of the features describing a data point have little to no information about the natural grouping of the data. Filtering such features is, thus, critical in harnessing meaningful information from such large-scale data. Many recent methods have attempted to find feature importance in a centroid-based clustering setting. Though empirically successful in classical low-dimensional settings, most perform poorly, especially on microarray and single-cell RNA-seq data. This paper extends the merits of weighted center-based clustering through the Ordered Weighted $\ell_1$ (OWL) norm for better feature selection. Appealing to the elegant properties of block coordinate-descent and Frank-Wolf algorithms, we are not only able to maintain computational efficiency but also able to outperform the state-of-the-art in high-dimensional settings. The proposal also comes with finite sample theoretical guarantees, including a rate of $\mathcal{O}\left(\sqrt{k \log p/n}\right)$, under model-sparsity, bridging the gap between theory and practice of weighted clustering.

APA

Chakraborty, C., Paul, S., Chakraborty, S. & Das, S.. (2023). Clustering High-dimensional Data with Ordered Weighted $\ell_1$ Regularization. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:7176-7189 Available from https://proceedings.mlr.press/v206/chakraborty23a.html.

Clustering High-dimensional Data with Ordered Weighted $\ell_1$ Regularization

Abstract

Cite this Paper

Related Material