A tight excess risk bound via a unified PACBayesian–Rademacher–Shtarkov–MDL complexity
[edit]
Proceedings of the 30th International Conference on Algorithmic Learning Theory, PMLR 98:433465, 2019.
Abstract
We present a novel notion of complexity that interpolates between and generalizes some classic complexity notions in learning theory: for empirical risk minimization (ERM) with arbitrary bounded loss, it is upper bounded in terms of dataindependent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the datadependent information (KL) complexity. For ERM, the new complexity reduces to normalized maximum likelihood complexity, i.e., a minimax logloss individual sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity to $L_2(P)$ entropy via Rademacher complexity, generalizing earlier results of Opper, Haussler, Lugosi, and CesaBianchi who covered the logloss case with $L_\infty$ entropy. Together, these results recover optimal bounds for VCtype and large (polynomial entropy) classes, replacing local Rademacher complexities by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: ‘easiness’ (Bernstein) conditions and model complexity.
Related Material


