Pick-to-Learn and Self-Certified Gaussian Process Approximations

Daniel Marks, Dario Paccagnan
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:2656-2664, 2025.

Abstract

Generalisation bounds are crucial for providing data-driven models with performance and safety guarantees. In this respect, bounds that do not require a held-out test set are particularly valuable as they allow the use of all data for training. While many such bounds do not improve upon the train-test approach, which remains the gold standard, the P2L algorithm (Paccagnan et al., 2023) has shown great potential. However, P2L comes with limitations, including computational overhead, reliance on consistent data, and restriction to non-Bayesian settings. In this work, we overcome these challenges in general settings and employ the corresponding results to show that classical Gaussian process (GP) training procedures can be interpreted as instantiations of P2L, thus inheriting tight, self-certified bounds. Three contributions underpin these conclusions. First, we introduce early stopping in P2L, equipping it with a tight generalisation bound to reduce training costs and address the non-consistent case. Second, we adapt P2L to the Bayesian setting and demonstrate its equivalence to posterior updating in a hierarchical model. Third, we show that greedy subset-of-data GPs are special P2L instantiations. Numerical evidence shows that the resulting P2L bounds we obtain compare favourably with the train-test and PAC-Bayes approaches on various real-world datasets.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-marks25a, title = {Pick-to-Learn and Self-Certified Gaussian Process Approximations}, author = {Marks, Daniel and Paccagnan, Dario}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {2656--2664}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/marks25a/marks25a.pdf}, url = {https://proceedings.mlr.press/v258/marks25a.html}, abstract = {Generalisation bounds are crucial for providing data-driven models with performance and safety guarantees. In this respect, bounds that do not require a held-out test set are particularly valuable as they allow the use of all data for training. While many such bounds do not improve upon the train-test approach, which remains the gold standard, the P2L algorithm (Paccagnan et al., 2023) has shown great potential. However, P2L comes with limitations, including computational overhead, reliance on consistent data, and restriction to non-Bayesian settings. In this work, we overcome these challenges in general settings and employ the corresponding results to show that classical Gaussian process (GP) training procedures can be interpreted as instantiations of P2L, thus inheriting tight, self-certified bounds. Three contributions underpin these conclusions. First, we introduce early stopping in P2L, equipping it with a tight generalisation bound to reduce training costs and address the non-consistent case. Second, we adapt P2L to the Bayesian setting and demonstrate its equivalence to posterior updating in a hierarchical model. Third, we show that greedy subset-of-data GPs are special P2L instantiations. Numerical evidence shows that the resulting P2L bounds we obtain compare favourably with the train-test and PAC-Bayes approaches on various real-world datasets.} }
Endnote
%0 Conference Paper %T Pick-to-Learn and Self-Certified Gaussian Process Approximations %A Daniel Marks %A Dario Paccagnan %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-marks25a %I PMLR %P 2656--2664 %U https://proceedings.mlr.press/v258/marks25a.html %V 258 %X Generalisation bounds are crucial for providing data-driven models with performance and safety guarantees. In this respect, bounds that do not require a held-out test set are particularly valuable as they allow the use of all data for training. While many such bounds do not improve upon the train-test approach, which remains the gold standard, the P2L algorithm (Paccagnan et al., 2023) has shown great potential. However, P2L comes with limitations, including computational overhead, reliance on consistent data, and restriction to non-Bayesian settings. In this work, we overcome these challenges in general settings and employ the corresponding results to show that classical Gaussian process (GP) training procedures can be interpreted as instantiations of P2L, thus inheriting tight, self-certified bounds. Three contributions underpin these conclusions. First, we introduce early stopping in P2L, equipping it with a tight generalisation bound to reduce training costs and address the non-consistent case. Second, we adapt P2L to the Bayesian setting and demonstrate its equivalence to posterior updating in a hierarchical model. Third, we show that greedy subset-of-data GPs are special P2L instantiations. Numerical evidence shows that the resulting P2L bounds we obtain compare favourably with the train-test and PAC-Bayes approaches on various real-world datasets.
APA
Marks, D. & Paccagnan, D.. (2025). Pick-to-Learn and Self-Certified Gaussian Process Approximations. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:2656-2664 Available from https://proceedings.mlr.press/v258/marks25a.html.

Related Material