DistributionDependent Analysis of GibbsERM Principle
[edit]
Proceedings of the ThirtySecond Conference on Learning Theory, PMLR 99:20282054, 2019.
Abstract
GibbsERM learning is a natural idealized model of learning with stochastic optimization algorithms (such as SGLD and —to some extent— SGD), while it also arises in other contexts, including PACBayesian theory, and sampling mechanisms. In this work we study the excess risk suffered by a GibbsERM learner that uses nonconvex, regularized empirical risk with the goal to understand the interplay between the datagenerating distribution and learning in large hypothesis spaces. Our main results are \emph{distributiondependent} upper bounds on several notions of excess risk. We show that, in all cases, the distributiondependent excess risk is essentially controlled by the \emph{effective dimension} $\text{tr}\left(\boldsymbol{H}^{\star} (\boldsymbol{H}^{\star} + \lambda \boldsymbol{I})^{1}\right)$ of the problem, where $\boldsymbol{H}^{\star}$ is the Hessian matrix of the risk at a local minimum. This is a wellestablished notion of effective dimension appearing in several previous works, including the analyses of SGD and ridge regression, but ours is the first work that brings this dimension to the analysis of learning using Gibbs densities. The distributiondependent view we advocate here improves upon earlier results of Raginsky et al. 2017, and can yield much tighter bounds depending on the interplay between the datagenerating distribution and the loss function. The first part of our analysis focuses on the \emph{localized} excess risk in the vicinity of a fixed local minimizer. This result is then extended to bounds on the \emph{global} excess risk, by characterizing probabilities of local minima (and their complement) under Gibbs densities, a results which might be of independent interest.
Related Material


