Dropout as a Structured Shrinkage Prior

Eric Nalisnick, Jose Miguel Hernandez-Lobato, Padhraic Smyth
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:4712-4722, 2019.

Abstract

Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network’s weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. Given the equivalence, we then show that dropout’s Monte Carlo training objective approximates marginal MAP estimation. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior ’automatic depth determination’ as it is the natural analog of automatic relevance determination for network depth. Lastly, we investigate two inference strategies that improve upon the aforementioned MAP approximation in regression benchmarks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-nalisnick19a, title = {Dropout as a Structured Shrinkage Prior}, author = {Nalisnick, Eric and Hernandez-Lobato, Jose Miguel and Smyth, Padhraic}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {4712--4722}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/nalisnick19a/nalisnick19a.pdf}, url = {https://proceedings.mlr.press/v97/nalisnick19a.html}, abstract = {Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network’s weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. Given the equivalence, we then show that dropout’s Monte Carlo training objective approximates marginal MAP estimation. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior ’automatic depth determination’ as it is the natural analog of automatic relevance determination for network depth. Lastly, we investigate two inference strategies that improve upon the aforementioned MAP approximation in regression benchmarks.} }
Endnote
%0 Conference Paper %T Dropout as a Structured Shrinkage Prior %A Eric Nalisnick %A Jose Miguel Hernandez-Lobato %A Padhraic Smyth %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-nalisnick19a %I PMLR %P 4712--4722 %U https://proceedings.mlr.press/v97/nalisnick19a.html %V 97 %X Dropout regularization of deep neural networks has been a mysterious yet effective tool to prevent overfitting. Explanations for its success range from the prevention of "co-adapted" weights to it being a form of cheap Bayesian inference. We propose a novel framework for understanding multiplicative noise in neural networks, considering continuous distributions as well as Bernoulli noise (i.e. dropout). We show that multiplicative noise induces structured shrinkage priors on a network’s weights. We derive the equivalence through reparametrization properties of scale mixtures and without invoking any approximations. Given the equivalence, we then show that dropout’s Monte Carlo training objective approximates marginal MAP estimation. We leverage these insights to propose a novel shrinkage framework for resnets, terming the prior ’automatic depth determination’ as it is the natural analog of automatic relevance determination for network depth. Lastly, we investigate two inference strategies that improve upon the aforementioned MAP approximation in regression benchmarks.
APA
Nalisnick, E., Hernandez-Lobato, J.M. & Smyth, P.. (2019). Dropout as a Structured Shrinkage Prior. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:4712-4722 Available from https://proceedings.mlr.press/v97/nalisnick19a.html.

Related Material