A tight excess risk bound via a unified PAC-Bayesian–Rademacher–Shtarkov–MDL complexity

Peter D. Grünwald; Nishant A. Mehta

A tight excess risk bound via a unified PAC-Bayesian–Rademacher–Shtarkov–MDL complexity

Peter D. Grünwald, Nishant A. Mehta

Proceedings of the 30th International Conference on Algorithmic Learning Theory, PMLR 98:433-465, 2019.

Abstract

We present a novel notion of complexity that interpolates between and generalizes some classic complexity notions in learning theory: for empirical risk minimization (ERM) with arbitrary bounded loss, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information (KL) complexity. For ERM, the new complexity reduces to normalized maximum likelihood complexity, i.e., a minimax log-loss individual sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity to $L_2(P)$ entropy via Rademacher complexity, generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who covered the log-loss case with $L_\infty$ entropy. Together, these results recover optimal bounds for VC-type and large (polynomial entropy) classes, replacing local Rademacher complexities by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: ‘easiness’ (Bernstein) conditions and model complexity.

Cite this Paper

BibTeX


@InProceedings{pmlr-v98-grunwald19a,
  title = 	 {A tight excess risk bound via a unified {PAC}-{B}ayesian–{R}ademacher–{S}htarkov–{MDL} complexity},
  author =       {Gr\"unwald, Peter D. and Mehta, Nishant A.},
  booktitle = 	 {Proceedings of the 30th International Conference on Algorithmic Learning Theory},
  pages = 	 {433--465},
  year = 	 {2019},
  editor = 	 {Garivier, Aurélien and Kale, Satyen},
  volume = 	 {98},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {22--24 Mar},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v98/grunwald19a/grunwald19a.pdf},
  url = 	 {https://proceedings.mlr.press/v98/grunwald19a.html},
  abstract = 	 {We present a novel notion of complexity that interpolates between and generalizes some classic complexity notions in learning theory: for empirical risk minimization (ERM) with arbitrary bounded loss, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information (KL) complexity. For ERM, the new complexity reduces to normalized maximum likelihood complexity, i.e., a minimax log-loss individual sequence regret.  Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity to $L_2(P)$ entropy via Rademacher complexity, generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who covered the log-loss case with $L_\infty$ entropy. Together, these results recover optimal bounds for VC-type and large (polynomial entropy) classes, replacing local Rademacher complexities by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: ‘easiness’ (Bernstein) conditions and model complexity.
 }
}

Endnote

%0 Conference Paper
%T A tight excess risk bound via a unified PAC-Bayesian–Rademacher–Shtarkov–MDL complexity
%A Peter D. Grünwald
%A Nishant A. Mehta
%B Proceedings of the 30th International Conference on Algorithmic Learning Theory
%C Proceedings of Machine Learning Research
%D 2019
%E Aurélien Garivier
%E Satyen Kale	
%F pmlr-v98-grunwald19a
%I PMLR
%P 433--465
%U https://proceedings.mlr.press/v98/grunwald19a.html
%V 98
%X We present a novel notion of complexity that interpolates between and generalizes some classic complexity notions in learning theory: for empirical risk minimization (ERM) with arbitrary bounded loss, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information (KL) complexity. For ERM, the new complexity reduces to normalized maximum likelihood complexity, i.e., a minimax log-loss individual sequence regret.  Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity to $L_2(P)$ entropy via Rademacher complexity, generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who covered the log-loss case with $L_\infty$ entropy. Together, these results recover optimal bounds for VC-type and large (polynomial entropy) classes, replacing local Rademacher complexities by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: ‘easiness’ (Bernstein) conditions and model complexity.

APA


Grünwald, P.D. & Mehta, N.A.. (2019). A tight excess risk bound via a unified PAC-Bayesian–Rademacher–Shtarkov–MDL complexity. Proceedings of the 30th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 98:433-465 Available from https://proceedings.mlr.press/v98/grunwald19a.html.

Related Material

Download PDF