Robust Sparsification via Sensitivity

Chansophea Wathanak In, Yi Li, David Woodruff, Xuan Wu
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:26446-26463, 2025.

Abstract

Robustness to outliers is important in machine learning. Many classical problems, including subspace embedding, clustering, and low-rank approximation, lack scalable, outlier-resilient algorithms. This paper considers machine learning problems of the form $\min_{x\in \mathbb{R}^d} F(x)$, where $F(x)=\sum_{i=1}^n F_i(x)$, and their robust counterparts $\min_{x\in\mathbb{R}^d} F^{(m)}(x)$, where $F^{(m)}(x)$ denotes the sum of all but the $m$ largest $F_i(x)$ values. We develop a general framework for constructing $\epsilon$-coresets for such robust problems, where an $\epsilon$-coreset is a weighted subset of functions $\{F_1(x),…,F_n(x)\}$ that provides a $(1+\epsilon)$-approximation to $F(x)$. Specifically, if the original problem $F$ has total sensitivity $T$ and admits a vanilla $\epsilon$-coreset of size $S$, our algorithm constructs an $\epsilon$-coreset of size $\tilde{O}(\frac{mT}{\epsilon})+S$ for the robust objective $F^{(m)}$. This coreset size can be shown to be near-tight for $\ell_2$ subspace embedding. Our coreset algorithm has scalable running time and leads to new or improved algorithms for the robust optimization problems. Empirical evaluations demonstrate that our coresets outperform uniform sampling on real-world data sets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-in25a, title = {Robust Sparsification via Sensitivity}, author = {In, Chansophea Wathanak and Li, Yi and Woodruff, David and Wu, Xuan}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {26446--26463}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/in25a/in25a.pdf}, url = {https://proceedings.mlr.press/v267/in25a.html}, abstract = {Robustness to outliers is important in machine learning. Many classical problems, including subspace embedding, clustering, and low-rank approximation, lack scalable, outlier-resilient algorithms. This paper considers machine learning problems of the form $\min_{x\in \mathbb{R}^d} F(x)$, where $F(x)=\sum_{i=1}^n F_i(x)$, and their robust counterparts $\min_{x\in\mathbb{R}^d} F^{(m)}(x)$, where $F^{(m)}(x)$ denotes the sum of all but the $m$ largest $F_i(x)$ values. We develop a general framework for constructing $\epsilon$-coresets for such robust problems, where an $\epsilon$-coreset is a weighted subset of functions $\{F_1(x),…,F_n(x)\}$ that provides a $(1+\epsilon)$-approximation to $F(x)$. Specifically, if the original problem $F$ has total sensitivity $T$ and admits a vanilla $\epsilon$-coreset of size $S$, our algorithm constructs an $\epsilon$-coreset of size $\tilde{O}(\frac{mT}{\epsilon})+S$ for the robust objective $F^{(m)}$. This coreset size can be shown to be near-tight for $\ell_2$ subspace embedding. Our coreset algorithm has scalable running time and leads to new or improved algorithms for the robust optimization problems. Empirical evaluations demonstrate that our coresets outperform uniform sampling on real-world data sets.} }
Endnote
%0 Conference Paper %T Robust Sparsification via Sensitivity %A Chansophea Wathanak In %A Yi Li %A David Woodruff %A Xuan Wu %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-in25a %I PMLR %P 26446--26463 %U https://proceedings.mlr.press/v267/in25a.html %V 267 %X Robustness to outliers is important in machine learning. Many classical problems, including subspace embedding, clustering, and low-rank approximation, lack scalable, outlier-resilient algorithms. This paper considers machine learning problems of the form $\min_{x\in \mathbb{R}^d} F(x)$, where $F(x)=\sum_{i=1}^n F_i(x)$, and their robust counterparts $\min_{x\in\mathbb{R}^d} F^{(m)}(x)$, where $F^{(m)}(x)$ denotes the sum of all but the $m$ largest $F_i(x)$ values. We develop a general framework for constructing $\epsilon$-coresets for such robust problems, where an $\epsilon$-coreset is a weighted subset of functions $\{F_1(x),…,F_n(x)\}$ that provides a $(1+\epsilon)$-approximation to $F(x)$. Specifically, if the original problem $F$ has total sensitivity $T$ and admits a vanilla $\epsilon$-coreset of size $S$, our algorithm constructs an $\epsilon$-coreset of size $\tilde{O}(\frac{mT}{\epsilon})+S$ for the robust objective $F^{(m)}$. This coreset size can be shown to be near-tight for $\ell_2$ subspace embedding. Our coreset algorithm has scalable running time and leads to new or improved algorithms for the robust optimization problems. Empirical evaluations demonstrate that our coresets outperform uniform sampling on real-world data sets.
APA
In, C.W., Li, Y., Woodruff, D. & Wu, X.. (2025). Robust Sparsification via Sensitivity. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:26446-26463 Available from https://proceedings.mlr.press/v267/in25a.html.

Related Material