A tight excess risk bound via a unified PAC-Bayesian–Rademacher–Shtarkov–MDL complexity

Peter D. Grünwald, Nishant A. Mehta
Proceedings of the 30th International Conference on Algorithmic Learning Theory, PMLR 98:433-465, 2019.

Abstract

We present a novel notion of complexity that interpolates between and generalizes some classic complexity notions in learning theory: for empirical risk minimization (ERM) with arbitrary bounded loss, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information (KL) complexity. For ERM, the new complexity reduces to normalized maximum likelihood complexity, i.e., a minimax log-loss individual sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity to L2(P) entropy via Rademacher complexity, generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who covered the log-loss case with L entropy. Together, these results recover optimal bounds for VC-type and large (polynomial entropy) classes, replacing local Rademacher complexities by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: ‘easiness’ (Bernstein) conditions and model complexity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v98-grunwald19a, title = {A tight excess risk bound via a unified {PAC}-{B}ayesian–{R}ademacher–{S}htarkov–{MDL} complexity}, author = {Gr\"unwald, Peter D. and Mehta, Nishant A.}, booktitle = {Proceedings of the 30th International Conference on Algorithmic Learning Theory}, pages = {433--465}, year = {2019}, editor = {Garivier, Aurélien and Kale, Satyen}, volume = {98}, series = {Proceedings of Machine Learning Research}, month = {22--24 Mar}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v98/grunwald19a/grunwald19a.pdf}, url = {https://proceedings.mlr.press/v98/grunwald19a.html}, abstract = {We present a novel notion of complexity that interpolates between and generalizes some classic complexity notions in learning theory: for empirical risk minimization (ERM) with arbitrary bounded loss, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information (KL) complexity. For ERM, the new complexity reduces to normalized maximum likelihood complexity, i.e., a minimax log-loss individual sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity to $L_2(P)$ entropy via Rademacher complexity, generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who covered the log-loss case with $L_\infty$ entropy. Together, these results recover optimal bounds for VC-type and large (polynomial entropy) classes, replacing local Rademacher complexities by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: ‘easiness’ (Bernstein) conditions and model complexity. } }
Endnote
%0 Conference Paper %T A tight excess risk bound via a unified PAC-Bayesian–Rademacher–Shtarkov–MDL complexity %A Peter D. Grünwald %A Nishant A. Mehta %B Proceedings of the 30th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2019 %E Aurélien Garivier %E Satyen Kale %F pmlr-v98-grunwald19a %I PMLR %P 433--465 %U https://proceedings.mlr.press/v98/grunwald19a.html %V 98 %X We present a novel notion of complexity that interpolates between and generalizes some classic complexity notions in learning theory: for empirical risk minimization (ERM) with arbitrary bounded loss, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information (KL) complexity. For ERM, the new complexity reduces to normalized maximum likelihood complexity, i.e., a minimax log-loss individual sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity to $L_2(P)$ entropy via Rademacher complexity, generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who covered the log-loss case with $L_\infty$ entropy. Together, these results recover optimal bounds for VC-type and large (polynomial entropy) classes, replacing local Rademacher complexities by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: ‘easiness’ (Bernstein) conditions and model complexity.
APA
Grünwald, P.D. & Mehta, N.A.. (2019). A tight excess risk bound via a unified PAC-Bayesian–Rademacher–Shtarkov–MDL complexity. Proceedings of the 30th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 98:433-465 Available from https://proceedings.mlr.press/v98/grunwald19a.html.

Related Material