Dropout distillation

Samuel Rota Bulò; Lorenzo Porzi; Peter Kontschieder

Dropout distillation

Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder

Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:99-107, 2016.

Abstract

Dropout is a popular stochastic regularization technique for deep neural networks that works by randomly dropping (i.e. zeroing) units from the network during training. This randomization process allows to implicitly train an ensemble of exponentially many networks sharing the same parametrization, which should be averaged at test time to deliver the final prediction. A typical workaround for this intractable averaging operation consists in scaling the layers undergoing dropout randomization. This simple rule called ’standard dropout’ is efficient, but might degrade the accuracy of the prediction. In this work we introduce a novel approach, coined ’dropout distillation’, that allows us to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency. We are thus able to construct models that are as efficient as standard dropout, or even more efficient, while being more accurate. Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout.

Cite this Paper

BibTeX


@InProceedings{pmlr-v48-bulo16,
  title = 	 {Dropout distillation},
  author = 	 {Bulò, Samuel Rota and Porzi, Lorenzo and Kontschieder, Peter},
  booktitle = 	 {Proceedings of The 33rd International Conference on Machine Learning},
  pages = 	 {99--107},
  year = 	 {2016},
  editor = 	 {Balcan, Maria Florina and Weinberger, Kilian Q.},
  volume = 	 {48},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {New York, New York, USA},
  month = 	 {20--22 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v48/bulo16.pdf},
  url = 	 {https://proceedings.mlr.press/v48/bulo16.html},
  abstract = 	 {Dropout is a popular stochastic regularization technique for deep neural networks that works by randomly dropping (i.e. zeroing) units from the network during training. This randomization process allows to implicitly train an ensemble of exponentially many networks sharing the same parametrization, which should be averaged at test time to deliver the final prediction. A typical workaround for this intractable averaging operation consists in scaling the layers undergoing dropout randomization. This simple rule called ’standard dropout’ is efficient, but might degrade the accuracy of the prediction. In this work we introduce a novel approach, coined ’dropout distillation’, that allows us to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency. We are thus able to construct models that are as efficient as standard dropout, or even more efficient, while being more accurate. Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout.}
}

Endnote

%0 Conference Paper
%T Dropout distillation
%A Samuel Rota Bulò
%A Lorenzo Porzi
%A Peter Kontschieder
%B Proceedings of The 33rd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2016
%E Maria Florina Balcan
%E Kilian Q. Weinberger	
%F pmlr-v48-bulo16
%I PMLR
%P 99--107
%U https://proceedings.mlr.press/v48/bulo16.html
%V 48
%X Dropout is a popular stochastic regularization technique for deep neural networks that works by randomly dropping (i.e. zeroing) units from the network during training. This randomization process allows to implicitly train an ensemble of exponentially many networks sharing the same parametrization, which should be averaged at test time to deliver the final prediction. A typical workaround for this intractable averaging operation consists in scaling the layers undergoing dropout randomization. This simple rule called ’standard dropout’ is efficient, but might degrade the accuracy of the prediction. In this work we introduce a novel approach, coined ’dropout distillation’, that allows us to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency. We are thus able to construct models that are as efficient as standard dropout, or even more efficient, while being more accurate. Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout.

RIS


TY  - CPAPER
TI  - Dropout distillation
AU  - Samuel Rota Bulò
AU  - Lorenzo Porzi
AU  - Peter Kontschieder
BT  - Proceedings of The 33rd International Conference on Machine Learning
DA  - 2016/06/11
ED  - Maria Florina Balcan
ED  - Kilian Q. Weinberger	
ID  - pmlr-v48-bulo16
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 48
SP  - 99
EP  - 107
L1  - http://proceedings.mlr.press/v48/bulo16.pdf
UR  - https://proceedings.mlr.press/v48/bulo16.html
AB  - Dropout is a popular stochastic regularization technique for deep neural networks that works by randomly dropping (i.e. zeroing) units from the network during training. This randomization process allows to implicitly train an ensemble of exponentially many networks sharing the same parametrization, which should be averaged at test time to deliver the final prediction. A typical workaround for this intractable averaging operation consists in scaling the layers undergoing dropout randomization. This simple rule called ’standard dropout’ is efficient, but might degrade the accuracy of the prediction. In this work we introduce a novel approach, coined ’dropout distillation’, that allows us to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency. We are thus able to construct models that are as efficient as standard dropout, or even more efficient, while being more accurate. Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout.
ER  -

APA


Bulò, S.R., Porzi, L. & Kontschieder, P.. (2016). Dropout distillation. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:99-107 Available from https://proceedings.mlr.press/v48/bulo16.html.

Dropout distillation

Abstract

Cite this Paper

Related Material