New Coresets for Projective Clustering and Applications

Murad Tukan, Xuan Wu, Samson Zhou, Vladimir Braverman, Dan Feldman
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:5391-5415, 2022.

Abstract

$(j,k)$-projective clustering is the natural generalization of the family of $k$-clustering and $j$-subspace clustering problems. Given a set of points $P$ in $\mathbb{R}^d$, the goal is to find $k$ flats of dimension $j$, i.e., affine subspaces, that best fit $P$ under a given distance measure. In this paper, we propose the first algorithm that returns an $L_\infty$ coreset of size polynomial in $d$. Moreover, we give the first strong coreset construction for general $M$-estimator regression. Specifically, we show that our construction provides efficient coreset constructions for Cauchy, Welsch, Huber, Geman-McClure, Tukey, $L_1-L_2$, and Fair regression, as well as general concave and power-bounded loss functions. Finally, we provide experimental results based on real-world datasets, showing the efficacy of our approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-tukan22a, title = { New Coresets for Projective Clustering and Applications }, author = {Tukan, Murad and Wu, Xuan and Zhou, Samson and Braverman, Vladimir and Feldman, Dan}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {5391--5415}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/tukan22a/tukan22a.pdf}, url = {https://proceedings.mlr.press/v151/tukan22a.html}, abstract = { $(j,k)$-projective clustering is the natural generalization of the family of $k$-clustering and $j$-subspace clustering problems. Given a set of points $P$ in $\mathbb{R}^d$, the goal is to find $k$ flats of dimension $j$, i.e., affine subspaces, that best fit $P$ under a given distance measure. In this paper, we propose the first algorithm that returns an $L_\infty$ coreset of size polynomial in $d$. Moreover, we give the first strong coreset construction for general $M$-estimator regression. Specifically, we show that our construction provides efficient coreset constructions for Cauchy, Welsch, Huber, Geman-McClure, Tukey, $L_1-L_2$, and Fair regression, as well as general concave and power-bounded loss functions. Finally, we provide experimental results based on real-world datasets, showing the efficacy of our approach. } }
Endnote
%0 Conference Paper %T New Coresets for Projective Clustering and Applications %A Murad Tukan %A Xuan Wu %A Samson Zhou %A Vladimir Braverman %A Dan Feldman %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-tukan22a %I PMLR %P 5391--5415 %U https://proceedings.mlr.press/v151/tukan22a.html %V 151 %X $(j,k)$-projective clustering is the natural generalization of the family of $k$-clustering and $j$-subspace clustering problems. Given a set of points $P$ in $\mathbb{R}^d$, the goal is to find $k$ flats of dimension $j$, i.e., affine subspaces, that best fit $P$ under a given distance measure. In this paper, we propose the first algorithm that returns an $L_\infty$ coreset of size polynomial in $d$. Moreover, we give the first strong coreset construction for general $M$-estimator regression. Specifically, we show that our construction provides efficient coreset constructions for Cauchy, Welsch, Huber, Geman-McClure, Tukey, $L_1-L_2$, and Fair regression, as well as general concave and power-bounded loss functions. Finally, we provide experimental results based on real-world datasets, showing the efficacy of our approach.
APA
Tukan, M., Wu, X., Zhou, S., Braverman, V. & Feldman, D.. (2022). New Coresets for Projective Clustering and Applications . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:5391-5415 Available from https://proceedings.mlr.press/v151/tukan22a.html.

Related Material