No Dimensional Sampling Coresets for Classification

Meysam Alishahi, Jeff M. Phillips
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:1008-1049, 2024.

Abstract

We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework. Such coresets seek the smallest possible subsets of input data, so one can optimize a loss function on the coreset and ensure approximation guarantees with respect to the original data. Our analysis provides the first no dimensional coresets, so the size does not depend on the dimension. Moreover, our results are general, apply for distributional input and can use iid samples, so provide sample complexity bounds, and work for a variety of loss functions. A key tool we develop is a Radamacher complexity version of the main sensitivity sampling approach, which can be of independent interest.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-alishahi24a, title = {No Dimensional Sampling Coresets for Classification}, author = {Alishahi, Meysam and Phillips, Jeff M.}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {1008--1049}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/alishahi24a/alishahi24a.pdf}, url = {https://proceedings.mlr.press/v235/alishahi24a.html}, abstract = {We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework. Such coresets seek the smallest possible subsets of input data, so one can optimize a loss function on the coreset and ensure approximation guarantees with respect to the original data. Our analysis provides the first no dimensional coresets, so the size does not depend on the dimension. Moreover, our results are general, apply for distributional input and can use iid samples, so provide sample complexity bounds, and work for a variety of loss functions. A key tool we develop is a Radamacher complexity version of the main sensitivity sampling approach, which can be of independent interest.} }
Endnote
%0 Conference Paper %T No Dimensional Sampling Coresets for Classification %A Meysam Alishahi %A Jeff M. Phillips %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-alishahi24a %I PMLR %P 1008--1049 %U https://proceedings.mlr.press/v235/alishahi24a.html %V 235 %X We refine and generalize what is known about coresets for classification problems via the sensitivity sampling framework. Such coresets seek the smallest possible subsets of input data, so one can optimize a loss function on the coreset and ensure approximation guarantees with respect to the original data. Our analysis provides the first no dimensional coresets, so the size does not depend on the dimension. Moreover, our results are general, apply for distributional input and can use iid samples, so provide sample complexity bounds, and work for a variety of loss functions. A key tool we develop is a Radamacher complexity version of the main sensitivity sampling approach, which can be of independent interest.
APA
Alishahi, M. & Phillips, J.M.. (2024). No Dimensional Sampling Coresets for Classification. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:1008-1049 Available from https://proceedings.mlr.press/v235/alishahi24a.html.

Related Material