Safe Learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity

Peter Grünwald
Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:397-420, 2011.

Abstract

We extend Bayesian MAP and Minimum Description Length (MDL) learning by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case. While standard Bayes and MDL can fail to converge if the model is wrong, the resulting “safe” estimator continues to achieve good rates with wrong models. Moreover, when applied to classification and regression models as considered in statistical learning theory, the approach achieves optimal rates under, e.g.,Tsybakov’s conditions, and reveals new situations in which we can penalize by $(- \log \mathrm{PRIOR})/n$ rather than $\sqrt{(- \log \mathrm{PRIOR})/n}$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v19-grunwald11a, title = {Safe Learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity}, author = {Grünwald, Peter}, booktitle = {Proceedings of the 24th Annual Conference on Learning Theory}, pages = {397--420}, year = {2011}, editor = {Kakade, Sham M. and von Luxburg, Ulrike}, volume = {19}, series = {Proceedings of Machine Learning Research}, address = {Budapest, Hungary}, month = {09--11 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v19/grunwald11a/grunwald11a.pdf}, url = {https://proceedings.mlr.press/v19/grunwald11a.html}, abstract = {We extend Bayesian MAP and Minimum Description Length (MDL) learning by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case. While standard Bayes and MDL can fail to converge if the model is wrong, the resulting “safe” estimator continues to achieve good rates with wrong models. Moreover, when applied to classification and regression models as considered in statistical learning theory, the approach achieves optimal rates under, e.g.,Tsybakov’s conditions, and reveals new situations in which we can penalize by $(- \log \mathrm{PRIOR})/n$ rather than $\sqrt{(- \log \mathrm{PRIOR})/n}$.} }
Endnote
%0 Conference Paper %T Safe Learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity %A Peter Grünwald %B Proceedings of the 24th Annual Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2011 %E Sham M. Kakade %E Ulrike von Luxburg %F pmlr-v19-grunwald11a %I PMLR %P 397--420 %U https://proceedings.mlr.press/v19/grunwald11a.html %V 19 %X We extend Bayesian MAP and Minimum Description Length (MDL) learning by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case. While standard Bayes and MDL can fail to converge if the model is wrong, the resulting “safe” estimator continues to achieve good rates with wrong models. Moreover, when applied to classification and regression models as considered in statistical learning theory, the approach achieves optimal rates under, e.g.,Tsybakov’s conditions, and reveals new situations in which we can penalize by $(- \log \mathrm{PRIOR})/n$ rather than $\sqrt{(- \log \mathrm{PRIOR})/n}$.
RIS
TY - CPAPER TI - Safe Learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity AU - Peter Grünwald BT - Proceedings of the 24th Annual Conference on Learning Theory DA - 2011/12/21 ED - Sham M. Kakade ED - Ulrike von Luxburg ID - pmlr-v19-grunwald11a PB - PMLR DP - Proceedings of Machine Learning Research VL - 19 SP - 397 EP - 420 L1 - http://proceedings.mlr.press/v19/grunwald11a/grunwald11a.pdf UR - https://proceedings.mlr.press/v19/grunwald11a.html AB - We extend Bayesian MAP and Minimum Description Length (MDL) learning by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case. While standard Bayes and MDL can fail to converge if the model is wrong, the resulting “safe” estimator continues to achieve good rates with wrong models. Moreover, when applied to classification and regression models as considered in statistical learning theory, the approach achieves optimal rates under, e.g.,Tsybakov’s conditions, and reveals new situations in which we can penalize by $(- \log \mathrm{PRIOR})/n$ rather than $\sqrt{(- \log \mathrm{PRIOR})/n}$. ER -
APA
Grünwald, P.. (2011). Safe Learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity. Proceedings of the 24th Annual Conference on Learning Theory, in Proceedings of Machine Learning Research 19:397-420 Available from https://proceedings.mlr.press/v19/grunwald11a.html.

Related Material