Early Stopping as Nonparametric Variational Inference

David Duvenaud; Dougal Maclaurin; Ryan Adams

Early Stopping as Nonparametric Variational Inference

David Duvenaud, Dougal Maclaurin, Ryan Adams

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1070-1077, 2016.

Abstract

We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy of this distribution during optimization, we give a scalable, unbiased estimate of a variational lower bound on the log marginal likelihood. This bound can be used to optimize hyperparameters instead of cross-validation. This Bayesian interpretation of SGD also suggests new overfitting-resistant optimization procedures, and gives a theoretical foundation for early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.

Cite this Paper

BibTeX


@InProceedings{pmlr-v51-duvenaud16,
  title = 	 {Early Stopping as Nonparametric Variational Inference},
  author = 	 {Duvenaud, David and Maclaurin, Dougal and Adams, Ryan},
  booktitle = 	 {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1070--1077},
  year = 	 {2016},
  editor = 	 {Gretton, Arthur and Robert, Christian C.},
  volume = 	 {51},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Cadiz, Spain},
  month = 	 {09--11 May},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v51/duvenaud16.pdf},
  url = 	 {https://proceedings.mlr.press/v51/duvenaud16.html},
  abstract = 	 {We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps.  By tracking the change in entropy of this distribution during optimization, we give a scalable, unbiased estimate of a variational lower bound on the log marginal likelihood. This bound can be used to optimize hyperparameters instead of cross-validation. This Bayesian interpretation of SGD also suggests new overfitting-resistant optimization procedures, and gives a theoretical foundation for early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.}
}

Endnote

%0 Conference Paper
%T Early Stopping as Nonparametric Variational Inference
%A David Duvenaud
%A Dougal Maclaurin
%A Ryan Adams
%B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2016
%E Arthur Gretton
%E Christian C. Robert	
%F pmlr-v51-duvenaud16
%I PMLR
%P 1070--1077
%U https://proceedings.mlr.press/v51/duvenaud16.html
%V 51
%X We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps.  By tracking the change in entropy of this distribution during optimization, we give a scalable, unbiased estimate of a variational lower bound on the log marginal likelihood. This bound can be used to optimize hyperparameters instead of cross-validation. This Bayesian interpretation of SGD also suggests new overfitting-resistant optimization procedures, and gives a theoretical foundation for early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.

RIS


TY  - CPAPER
TI  - Early Stopping as Nonparametric Variational Inference
AU  - David Duvenaud
AU  - Dougal Maclaurin
AU  - Ryan Adams
BT  - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics
DA  - 2016/05/02
ED  - Arthur Gretton
ED  - Christian C. Robert	
ID  - pmlr-v51-duvenaud16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 51
SP  - 1070
EP  - 1077
L1  - http://proceedings.mlr.press/v51/duvenaud16.pdf
UR  - https://proceedings.mlr.press/v51/duvenaud16.html
AB  - We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps.  By tracking the change in entropy of this distribution during optimization, we give a scalable, unbiased estimate of a variational lower bound on the log marginal likelihood. This bound can be used to optimize hyperparameters instead of cross-validation. This Bayesian interpretation of SGD also suggests new overfitting-resistant optimization procedures, and gives a theoretical foundation for early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.
ER  -

APA


Duvenaud, D., Maclaurin, D. & Adams, R.. (2016). Early Stopping as Nonparametric Variational Inference. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:1070-1077 Available from https://proceedings.mlr.press/v51/duvenaud16.html.

Related Material

Download PDF