Better-than-KL PAC-Bayes Bounds

Ilja Kuzborskij, Kwang-Sung Jun, Yulian Wu, Kyoungseok Jang, Francesco Orabona
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:3325-3352, 2024.

Abstract

Let $f(\theta, X_1),$ $ …,$ $ f(\theta, X_n)$ be a sequence of random elements, where $f$ is a fixed scalar function, $X_1, …, X_n$ are independent random variables (data), and $\theta$ is a random parameter distributed according to some data-dependent \emph{posterior} distribution $P_n$. In this paper, we consider the problem of proving concentration inequalities to estimate the mean of the sequence. An example of such a problem is the estimation of the generalization error of some predictor trained by a stochastic algorithm, such as a neural network, where $f$ is a loss function. Classically, this problem is approached through a \emph{PAC-Bayes} analysis where, in addition to the posterior, we choose a \emph{prior} distribution which captures our belief about the inductive bias of the learning problem. Then, the key quantity in PAC-Bayes concentration bounds is a divergence that captures the \emph{complexity} of the learning problem where the de facto standard choice is the Kullback-Leibler (KL) divergence. However, the tightness of this choice has rarely been questioned. In this paper, we challenge the tightness of the KL-divergence-based bounds by showing that it is possible to achieve a strictly tighter bound. In particular, we demonstrate new \emph{high-probability} PAC-Bayes bounds with a novel and \emph{better-than-KL} divergence that is inspired by Zhang et al. (2022). Our proof is inspired by recent advances in regret analysis of gambling algorithms, and its use to derive concentration inequalities. Our result is first-of-its-kind in that existing PAC-Bayes bounds with non-KL divergences are not known to be strictly better than KL. Thus, we believe our work marks the first step towards identifying optimal rates of PAC-Bayes bounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-kuzborskij24a, title = {Better-than-{KL} {PAC}-{B}ayes {B}ounds}, author = {Kuzborskij, Ilja and Jun, Kwang-Sung and Wu, Yulian and Jang, Kyoungseok and Orabona, Francesco}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {3325--3352}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/kuzborskij24a/kuzborskij24a.pdf}, url = {https://proceedings.mlr.press/v247/kuzborskij24a.html}, abstract = { Let $f(\theta, X_1),$ $ …,$ $ f(\theta, X_n)$ be a sequence of random elements, where $f$ is a fixed scalar function, $X_1, …, X_n$ are independent random variables (data), and $\theta$ is a random parameter distributed according to some data-dependent \emph{posterior} distribution $P_n$. In this paper, we consider the problem of proving concentration inequalities to estimate the mean of the sequence. An example of such a problem is the estimation of the generalization error of some predictor trained by a stochastic algorithm, such as a neural network, where $f$ is a loss function. Classically, this problem is approached through a \emph{PAC-Bayes} analysis where, in addition to the posterior, we choose a \emph{prior} distribution which captures our belief about the inductive bias of the learning problem. Then, the key quantity in PAC-Bayes concentration bounds is a divergence that captures the \emph{complexity} of the learning problem where the de facto standard choice is the Kullback-Leibler (KL) divergence. However, the tightness of this choice has rarely been questioned. In this paper, we challenge the tightness of the KL-divergence-based bounds by showing that it is possible to achieve a strictly tighter bound. In particular, we demonstrate new \emph{high-probability} PAC-Bayes bounds with a novel and \emph{better-than-KL} divergence that is inspired by Zhang et al. (2022). Our proof is inspired by recent advances in regret analysis of gambling algorithms, and its use to derive concentration inequalities. Our result is first-of-its-kind in that existing PAC-Bayes bounds with non-KL divergences are not known to be strictly better than KL. Thus, we believe our work marks the first step towards identifying optimal rates of PAC-Bayes bounds.} }
Endnote
%0 Conference Paper %T Better-than-KL PAC-Bayes Bounds %A Ilja Kuzborskij %A Kwang-Sung Jun %A Yulian Wu %A Kyoungseok Jang %A Francesco Orabona %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-kuzborskij24a %I PMLR %P 3325--3352 %U https://proceedings.mlr.press/v247/kuzborskij24a.html %V 247 %X Let $f(\theta, X_1),$ $ …,$ $ f(\theta, X_n)$ be a sequence of random elements, where $f$ is a fixed scalar function, $X_1, …, X_n$ are independent random variables (data), and $\theta$ is a random parameter distributed according to some data-dependent \emph{posterior} distribution $P_n$. In this paper, we consider the problem of proving concentration inequalities to estimate the mean of the sequence. An example of such a problem is the estimation of the generalization error of some predictor trained by a stochastic algorithm, such as a neural network, where $f$ is a loss function. Classically, this problem is approached through a \emph{PAC-Bayes} analysis where, in addition to the posterior, we choose a \emph{prior} distribution which captures our belief about the inductive bias of the learning problem. Then, the key quantity in PAC-Bayes concentration bounds is a divergence that captures the \emph{complexity} of the learning problem where the de facto standard choice is the Kullback-Leibler (KL) divergence. However, the tightness of this choice has rarely been questioned. In this paper, we challenge the tightness of the KL-divergence-based bounds by showing that it is possible to achieve a strictly tighter bound. In particular, we demonstrate new \emph{high-probability} PAC-Bayes bounds with a novel and \emph{better-than-KL} divergence that is inspired by Zhang et al. (2022). Our proof is inspired by recent advances in regret analysis of gambling algorithms, and its use to derive concentration inequalities. Our result is first-of-its-kind in that existing PAC-Bayes bounds with non-KL divergences are not known to be strictly better than KL. Thus, we believe our work marks the first step towards identifying optimal rates of PAC-Bayes bounds.
APA
Kuzborskij, I., Jun, K., Wu, Y., Jang, K. & Orabona, F.. (2024). Better-than-KL PAC-Bayes Bounds. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:3325-3352 Available from https://proceedings.mlr.press/v247/kuzborskij24a.html.

Related Material