Variational Dropout Sparsifies Deep Neural Networks

Dmitry Molchanov, Arsenii Ashukha, Dmitry Vetrov
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:2498-2507, 2017.

Abstract

We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-molchanov17a, title = {Variational Dropout Sparsifies Deep Neural Networks}, author = {Dmitry Molchanov and Arsenii Ashukha and Dmitry Vetrov}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {2498--2507}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/molchanov17a/molchanov17a.pdf}, url = { http://proceedings.mlr.press/v70/molchanov17a.html }, abstract = {We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.} }
Endnote
%0 Conference Paper %T Variational Dropout Sparsifies Deep Neural Networks %A Dmitry Molchanov %A Arsenii Ashukha %A Dmitry Vetrov %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-molchanov17a %I PMLR %P 2498--2507 %U http://proceedings.mlr.press/v70/molchanov17a.html %V 70 %X We explore a recently proposed Variational Dropout technique that provided an elegant Bayesian interpretation to Gaussian Dropout. We extend Variational Dropout to the case when dropout rates are unbounded, propose a way to reduce the variance of the gradient estimator and report first experimental results with individual dropout rates per weight. Interestingly, it leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination effect in empirical Bayes but has a number of advantages. We reduce the number of parameters up to 280 times on LeNet architectures and up to 68 times on VGG-like networks with a negligible decrease of accuracy.
APA
Molchanov, D., Ashukha, A. & Vetrov, D.. (2017). Variational Dropout Sparsifies Deep Neural Networks. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:2498-2507 Available from http://proceedings.mlr.press/v70/molchanov17a.html .

Related Material