A tight excess risk bound via a unified PAC-Bayesian–Rademacher–Shtarkov–MDL complexity

Peter D. Grünwald, Nishant A. Mehta
Proceedings of the 30th International Conference on Algorithmic Learning Theory, PMLR 98:433-465, 2019.

Abstract

We present a novel notion of complexity that interpolates between and generalizes some classic complexity notions in learning theory: for empirical risk minimization (ERM) with arbitrary bounded loss, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information (KL) complexity. For ERM, the new complexity reduces to normalized maximum likelihood complexity, i.e., a minimax log-loss individual sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity to $L_2(P)$ entropy via Rademacher complexity, generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who covered the log-loss case with $L_\infty$ entropy. Together, these results recover optimal bounds for VC-type and large (polynomial entropy) classes, replacing local Rademacher complexities by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: ‘easiness’ (Bernstein) conditions and model complexity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v98-grunwald19a, title = {A tight excess risk bound via a unified {PAC}-{B}ayesian–{R}ademacher–{S}htarkov–{MDL} complexity}, author = {Gr\"unwald, Peter D. and Mehta, Nishant A.}, booktitle = {Proceedings of the 30th International Conference on Algorithmic Learning Theory}, pages = {433--465}, year = {2019}, editor = {Garivier, Aurélien and Kale, Satyen}, volume = {98}, series = {Proceedings of Machine Learning Research}, month = {22--24 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v98/grunwald19a/grunwald19a.pdf}, url = {https://proceedings.mlr.press/v98/grunwald19a.html}, abstract = {We present a novel notion of complexity that interpolates between and generalizes some classic complexity notions in learning theory: for empirical risk minimization (ERM) with arbitrary bounded loss, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information (KL) complexity. For ERM, the new complexity reduces to normalized maximum likelihood complexity, i.e., a minimax log-loss individual sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity to $L_2(P)$ entropy via Rademacher complexity, generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who covered the log-loss case with $L_\infty$ entropy. Together, these results recover optimal bounds for VC-type and large (polynomial entropy) classes, replacing local Rademacher complexities by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: ‘easiness’ (Bernstein) conditions and model complexity. } }
Endnote
%0 Conference Paper %T A tight excess risk bound via a unified PAC-Bayesian–Rademacher–Shtarkov–MDL complexity %A Peter D. Grünwald %A Nishant A. Mehta %B Proceedings of the 30th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Aurélien Garivier %E Satyen Kale %F pmlr-v98-grunwald19a %I PMLR %P 433--465 %U https://proceedings.mlr.press/v98/grunwald19a.html %V 98 %X We present a novel notion of complexity that interpolates between and generalizes some classic complexity notions in learning theory: for empirical risk minimization (ERM) with arbitrary bounded loss, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information (KL) complexity. For ERM, the new complexity reduces to normalized maximum likelihood complexity, i.e., a minimax log-loss individual sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity to $L_2(P)$ entropy via Rademacher complexity, generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who covered the log-loss case with $L_\infty$ entropy. Together, these results recover optimal bounds for VC-type and large (polynomial entropy) classes, replacing local Rademacher complexities by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: ‘easiness’ (Bernstein) conditions and model complexity.
APA
Grünwald, P.D. & Mehta, N.A.. (2019). A tight excess risk bound via a unified PAC-Bayesian–Rademacher–Shtarkov–MDL complexity. Proceedings of the 30th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 98:433-465 Available from https://proceedings.mlr.press/v98/grunwald19a.html.

Related Material