Efficient Core-set Selection for Deep Learning Through Squared Loss Minimization

Jianting Chen
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:8065-8076, 2025.

Abstract

Core-set selection (CS) for deep learning has become crucial for enhancing training efficiency and understanding datasets by identifying the most informative subsets. However, most existing methods rely on heuristics or complex optimization, struggling to balance efficiency and effectiveness. To address this, we propose a novel CS objective that adaptively balances losses between core-set and non-core-set samples by minimizing the sum of squared losses across all samples. Building on this objective, we introduce the Maximum Reduction as Maximum Contribution criterion (MRMC), which identifies samples with the maximal reduction in loss as those making the maximal contribution to overall convergence. Additionally, a balance constraint is incorporated to ensure an even distribution of contributions from the core-set. Experimental results demonstrate that MRMC improves training efficiency significantly while preserving model performance with minimal cost.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-chen25q, title = {Efficient Core-set Selection for Deep Learning Through Squared Loss Minimization}, author = {Chen, Jianting}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {8065--8076}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/chen25q/chen25q.pdf}, url = {https://proceedings.mlr.press/v267/chen25q.html}, abstract = {Core-set selection (CS) for deep learning has become crucial for enhancing training efficiency and understanding datasets by identifying the most informative subsets. However, most existing methods rely on heuristics or complex optimization, struggling to balance efficiency and effectiveness. To address this, we propose a novel CS objective that adaptively balances losses between core-set and non-core-set samples by minimizing the sum of squared losses across all samples. Building on this objective, we introduce the Maximum Reduction as Maximum Contribution criterion (MRMC), which identifies samples with the maximal reduction in loss as those making the maximal contribution to overall convergence. Additionally, a balance constraint is incorporated to ensure an even distribution of contributions from the core-set. Experimental results demonstrate that MRMC improves training efficiency significantly while preserving model performance with minimal cost.} }
Endnote
%0 Conference Paper %T Efficient Core-set Selection for Deep Learning Through Squared Loss Minimization %A Jianting Chen %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-chen25q %I PMLR %P 8065--8076 %U https://proceedings.mlr.press/v267/chen25q.html %V 267 %X Core-set selection (CS) for deep learning has become crucial for enhancing training efficiency and understanding datasets by identifying the most informative subsets. However, most existing methods rely on heuristics or complex optimization, struggling to balance efficiency and effectiveness. To address this, we propose a novel CS objective that adaptively balances losses between core-set and non-core-set samples by minimizing the sum of squared losses across all samples. Building on this objective, we introduce the Maximum Reduction as Maximum Contribution criterion (MRMC), which identifies samples with the maximal reduction in loss as those making the maximal contribution to overall convergence. Additionally, a balance constraint is incorporated to ensure an even distribution of contributions from the core-set. Experimental results demonstrate that MRMC improves training efficiency significantly while preserving model performance with minimal cost.
APA
Chen, J.. (2025). Efficient Core-set Selection for Deep Learning Through Squared Loss Minimization. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:8065-8076 Available from https://proceedings.mlr.press/v267/chen25q.html.

Related Material