Generalization Bounds of SGLD for Nonconvex Learning: Two Theoretical Viewpoints
[edit]
Proceedings of the 31st Conference On Learning Theory, PMLR 75:605638, 2018.
Abstract
We study the generalization errors of \emph{nonconvex} regularized ERM procedures using Stochastic Gradient Langevin Dynamics (SGLD). Two theories are proposed with nonasymptotic discretetime analysis, using stability and PACBayesian theory respectively. The stabilitybased theory obtains a bound of $O\left(\frac{1}{n}L\sqrt{\beta T_N}\right)$, where $L$ is Lipschitz parameter, $\beta$ is inverse temperature, and $T_N$ is the sum of step sizes. For PACBayesian theory, though the bound has a slower $O(1/\sqrt{n})$ rate, the contribution of each step decays exponentially through time, and the uniform Lipschitz constant is also replaced by actual norms of gradients along the optimization trajectory. Our bounds have reasonable dependence on aggregated step sizes, and do not explicitly depend on dimensions, norms or other capacity measures of the parameter. The bounds characterize how the noises in the algorithm itself controls the statistical learning behavior in nonconvex problems, without uniform convergence in the hypothesis space, which sheds light on the effect of training algorithms on the generalization error for deep neural networks.
Related Material


