Fast dropout training

Sida Wang; Christopher Manning

Fast dropout training

Sida Wang, Christopher Manning

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(2):118-126, 2013.

Abstract

Preventing feature co-adaptation by encouraging independent contributions from different features often improves classification and regression performance. Dropout training (Hinton et al., 2012) does this by randomly dropping out (zeroing) hidden units and input features during training of neural networks. However, repeatedly sampling a random subset of input features makes training much slower. Based on an examination of the implied objective function of dropout training, we show how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective. This approximation, justified by the central limit theorem and empirical evidence, gives an order of magnitude speedup and more stability. We show how to do fast dropout training for classification, regression, and multilayer neural networks. Beyond dropout, our technique is extended to integrate out other types of noise and small image transformations.

Cite this Paper

BibTeX

@InProceedings{pmlr-v28-wang13a,
  title = 	 {Fast dropout training},
  author = 	 {Wang, Sida and Manning, Christopher},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {118--126},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {2},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/wang13a.pdf},
  url = 	 {https://proceedings.mlr.press/v28/wang13a.html},
  abstract = 	 {Preventing feature co-adaptation by encouraging independent contributions from different features often improves classification and regression performance.  Dropout training (Hinton et al., 2012) does this by randomly dropping out (zeroing) hidden units and input features during training of neural networks. However, repeatedly sampling a random subset of input features makes training much slower. Based on an examination of the implied objective function of dropout training, we show how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective.  This approximation, justified by the central limit theorem and empirical evidence, gives an order of magnitude speedup and more stability.  We show how to do fast dropout training for classification, regression, and multilayer neural networks. Beyond dropout, our technique is extended to integrate out other types of noise and small image transformations. }
}

Endnote

%0 Conference Paper
%T Fast dropout training
%A Sida Wang
%A Christopher Manning
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-wang13a
%I PMLR
%P 118--126
%U https://proceedings.mlr.press/v28/wang13a.html
%V 28
%N 2
%X Preventing feature co-adaptation by encouraging independent contributions from different features often improves classification and regression performance.  Dropout training (Hinton et al., 2012) does this by randomly dropping out (zeroing) hidden units and input features during training of neural networks. However, repeatedly sampling a random subset of input features makes training much slower. Based on an examination of the implied objective function of dropout training, we show how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective.  This approximation, justified by the central limit theorem and empirical evidence, gives an order of magnitude speedup and more stability.  We show how to do fast dropout training for classification, regression, and multilayer neural networks. Beyond dropout, our technique is extended to integrate out other types of noise and small image transformations.

RIS

TY  - CPAPER
TI  - Fast dropout training
AU  - Sida Wang
AU  - Christopher Manning
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/13
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-wang13a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 2
SP  - 118
EP  - 126
L1  - http://proceedings.mlr.press/v28/wang13a.pdf
UR  - https://proceedings.mlr.press/v28/wang13a.html
AB  - Preventing feature co-adaptation by encouraging independent contributions from different features often improves classification and regression performance.  Dropout training (Hinton et al., 2012) does this by randomly dropping out (zeroing) hidden units and input features during training of neural networks. However, repeatedly sampling a random subset of input features makes training much slower. Based on an examination of the implied objective function of dropout training, we show how to do fast dropout training by sampling from or integrating a Gaussian approximation, instead of doing Monte Carlo optimization of this objective.  This approximation, justified by the central limit theorem and empirical evidence, gives an order of magnitude speedup and more stability.  We show how to do fast dropout training for classification, regression, and multilayer neural networks. Beyond dropout, our technique is extended to integrate out other types of noise and small image transformations. 
ER  -

APA

Wang, S. & Manning, C.. (2013). Fast dropout training. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(2):118-126 Available from https://proceedings.mlr.press/v28/wang13a.html.

Related Material

Download PDF