Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses

Mathieu Bazinet, Valentina Zantedeschi, Pascal Germain
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:3286-3294, 2025.

Abstract

The sample compression theory provides generalization guarantees for predictors that can be fully defined using a subset of the training dataset and a (short) message string, generally defined as a binary sequence. Previous works provided generalization bounds for the zero-one loss, which is restrictive notably when applied to deep learning approaches. In this paper, we present a general framework for deriving new sample compression bounds that hold for real-valued unbounded losses. Using the Pick-To-Learn (P2L) meta-algorithm, which transforms the training method of any machine-learning predictor to yield sample-compressed predictors, we empirically demonstrate the tightness of the bounds and their versatility by evaluating them on random forests and multiple types of neural networks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-bazinet25a, title = {Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses}, author = {Bazinet, Mathieu and Zantedeschi, Valentina and Germain, Pascal}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {3286--3294}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/bazinet25a/bazinet25a.pdf}, url = {https://proceedings.mlr.press/v258/bazinet25a.html}, abstract = {The sample compression theory provides generalization guarantees for predictors that can be fully defined using a subset of the training dataset and a (short) message string, generally defined as a binary sequence. Previous works provided generalization bounds for the zero-one loss, which is restrictive notably when applied to deep learning approaches. In this paper, we present a general framework for deriving new sample compression bounds that hold for real-valued unbounded losses. Using the Pick-To-Learn (P2L) meta-algorithm, which transforms the training method of any machine-learning predictor to yield sample-compressed predictors, we empirically demonstrate the tightness of the bounds and their versatility by evaluating them on random forests and multiple types of neural networks.} }
Endnote
%0 Conference Paper %T Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses %A Mathieu Bazinet %A Valentina Zantedeschi %A Pascal Germain %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-bazinet25a %I PMLR %P 3286--3294 %U https://proceedings.mlr.press/v258/bazinet25a.html %V 258 %X The sample compression theory provides generalization guarantees for predictors that can be fully defined using a subset of the training dataset and a (short) message string, generally defined as a binary sequence. Previous works provided generalization bounds for the zero-one loss, which is restrictive notably when applied to deep learning approaches. In this paper, we present a general framework for deriving new sample compression bounds that hold for real-valued unbounded losses. Using the Pick-To-Learn (P2L) meta-algorithm, which transforms the training method of any machine-learning predictor to yield sample-compressed predictors, we empirically demonstrate the tightness of the bounds and their versatility by evaluating them on random forests and multiple types of neural networks.
APA
Bazinet, M., Zantedeschi, V. & Germain, P.. (2025). Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:3286-3294 Available from https://proceedings.mlr.press/v258/bazinet25a.html.

Related Material