[edit]
Safe Learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity
Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:397-420, 2011.
Abstract
We extend Bayesian MAP and Minimum Description Length (MDL) learning by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case. While standard Bayes and MDL can fail to converge if the model is wrong, the resulting “safe” estimator continues to achieve good rates with wrong models. Moreover, when applied to classification and regression models as considered in statistical learning theory, the approach achieves optimal rates under, e.g.,Tsybakov’s conditions, and reveals new situations in which we can penalize by $(- \log \mathrm{PRIOR})/n$ rather than $\sqrt{(- \log \mathrm{PRIOR})/n}$.