EntropySGD optimizes the prior of a PACBayes bound: Generalization properties of EntropySGD and datadependent priors
[edit]
Proceedings of the 35th International Conference on Machine Learning, PMLR 80:13771386, 2018.
Abstract
We show that EntropySGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PACBayes bound on the risk of a Gibbs (posterior) classifier, i.e., a randomized classifier obtained by a risksensitive perturbation of the weights of a learned classifier. EntropySGD works by optimizing the bound’s prior, violating the hypothesis of the PACBayes theorem that the prior is chosen independently of the data. Indeed, available implementations of EntropySGD rapidly obtain zero training error on random labels and the same holds of the Gibbs posterior. In order to obtain a valid generalization bound, we rely on a result showing that datadependent priors obtained by stochastic gradient Langevin dynamics (SGLD) yield valid PACBayes bounds provided the target distribution of SGLD is epsdifferentially private. We observe that test error on MNIST and CIFAR10 falls within the (empirically nonvacuous) risk bounds computed under the assumption that SGLD reaches stationarity. In particular, EntropySGLD can be configured to yield relatively tight generalization bounds and still fit real labels, although these same settings do not obtain stateoftheart performance.
Related Material


