Comparing Comparators in Generalization Bounds

Fredrik Hellström, Benjamin Guedj
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:73-81, 2024.

Abstract

We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training loss and the population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cramér function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v238-hellstrom24a, title = { Comparing Comparators in Generalization Bounds }, author = {Hellstr\"{o}m, Fredrik and Guedj, Benjamin}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics}, pages = {73--81}, year = {2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, month = {02--04 May}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v238/hellstrom24a/hellstrom24a.pdf}, url = {https://proceedings.mlr.press/v238/hellstrom24a.html}, abstract = { We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training loss and the population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cramér function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions. } }
Endnote
%0 Conference Paper %T Comparing Comparators in Generalization Bounds %A Fredrik Hellström %A Benjamin Guedj %B Proceedings of The 27th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2024 %E Sanjoy Dasgupta %E Stephan Mandt %E Yingzhen Li %F pmlr-v238-hellstrom24a %I PMLR %P 73--81 %U https://proceedings.mlr.press/v238/hellstrom24a.html %V 238 %X We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training loss and the population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cramér function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.
APA
Hellström, F. & Guedj, B.. (2024). Comparing Comparators in Generalization Bounds . Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 238:73-81 Available from https://proceedings.mlr.press/v238/hellstrom24a.html.

Related Material