Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints

Xiaobo Xia, Jiale Liu, Shaokun Zhang, Qingyun Wu, Hongxin Wei, Tongliang Liu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:54082-54103, 2024.

Abstract

Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically performs on par with full data. Practitioners regularly desire to identify the smallest possible coreset in realistic scenes while maintaining comparable model performance, to minimize costs and maximize acceleration. Motivated by this desideratum, for the first time, we pose the problem of refined coreset selection, in which the minimal coreset size under model performance constraints is explored. Moreover, to address this problem, we propose an innovative method, which maintains optimization priority order over the model performance and coreset size, and efficiently optimizes them in the coreset selection procedure. Theoretically, we provide the convergence guarantee of the proposed method. Empirically, extensive experiments confirm its superiority compared with previous strategies, often yielding better model performance with smaller coreset sizes. The implementation is available at https://github.com/xiaoboxia/LBCS.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-xia24b, title = {Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints}, author = {Xia, Xiaobo and Liu, Jiale and Zhang, Shaokun and Wu, Qingyun and Wei, Hongxin and Liu, Tongliang}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {54082--54103}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/xia24b/xia24b.pdf}, url = {https://proceedings.mlr.press/v235/xia24b.html}, abstract = {Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically performs on par with full data. Practitioners regularly desire to identify the smallest possible coreset in realistic scenes while maintaining comparable model performance, to minimize costs and maximize acceleration. Motivated by this desideratum, for the first time, we pose the problem of refined coreset selection, in which the minimal coreset size under model performance constraints is explored. Moreover, to address this problem, we propose an innovative method, which maintains optimization priority order over the model performance and coreset size, and efficiently optimizes them in the coreset selection procedure. Theoretically, we provide the convergence guarantee of the proposed method. Empirically, extensive experiments confirm its superiority compared with previous strategies, often yielding better model performance with smaller coreset sizes. The implementation is available at https://github.com/xiaoboxia/LBCS.} }
Endnote
%0 Conference Paper %T Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints %A Xiaobo Xia %A Jiale Liu %A Shaokun Zhang %A Qingyun Wu %A Hongxin Wei %A Tongliang Liu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-xia24b %I PMLR %P 54082--54103 %U https://proceedings.mlr.press/v235/xia24b.html %V 235 %X Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically performs on par with full data. Practitioners regularly desire to identify the smallest possible coreset in realistic scenes while maintaining comparable model performance, to minimize costs and maximize acceleration. Motivated by this desideratum, for the first time, we pose the problem of refined coreset selection, in which the minimal coreset size under model performance constraints is explored. Moreover, to address this problem, we propose an innovative method, which maintains optimization priority order over the model performance and coreset size, and efficiently optimizes them in the coreset selection procedure. Theoretically, we provide the convergence guarantee of the proposed method. Empirically, extensive experiments confirm its superiority compared with previous strategies, often yielding better model performance with smaller coreset sizes. The implementation is available at https://github.com/xiaoboxia/LBCS.
APA
Xia, X., Liu, J., Zhang, S., Wu, Q., Wei, H. & Liu, T.. (2024). Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:54082-54103 Available from https://proceedings.mlr.press/v235/xia24b.html.

Related Material