Jumpout : Improved Dropout for Deep Neural Networks with ReLUs

Shengjie Wang, Tianyi Zhou, Jeff Bilmes
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:6668-6676, 2019.

Abstract

We discuss three novel insights about dropout for DNNs with ReLUs: 1) dropout encourages each local linear piece of a DNN to be trained on data points from nearby regions; 2) the same dropout rate results in different (effective) deactivation rates for layers with different portions of ReLU-deactivated neurons; and 3) the rescaling factor of dropout causes a normalization inconsistency between training and test when used together with batch normalization. The above leads to three simple but nontrivial modifications resulting in our method “jumpout.” Jumpout samples the dropout rate from a monotone decreasing distribution (e.g., the right half of a Gaussian), so each local linear piece is trained, with high probability, to work better for data points from nearby than more distant regions. Jumpout moreover adaptively normalizes the dropout rate at each layer and every training batch, so the effective deactivation rate on the activated neurons is kept the same. Furthermore, it rescales the outputs for a better trade-off that keeps both the variance and mean of neurons more consistent between training and test phases, thereby mitigating the incompatibility between dropout and batch normalization. Jumpout significantly improves the performance of different neural nets on CIFAR10, CIFAR100, Fashion-MNIST, STL10, SVHN, ImageNet-1k, etc., while introducing negligible additional memory and computation costs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v97-wang19q, title = {Jumpout : Improved Dropout for Deep Neural Networks with {R}e{LU}s}, author = {Wang, Shengjie and Zhou, Tianyi and Bilmes, Jeff}, booktitle = {Proceedings of the 36th International Conference on Machine Learning}, pages = {6668--6676}, year = {2019}, editor = {Chaudhuri, Kamalika and Salakhutdinov, Ruslan}, volume = {97}, series = {Proceedings of Machine Learning Research}, month = {09--15 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v97/wang19q/wang19q.pdf}, url = {https://proceedings.mlr.press/v97/wang19q.html}, abstract = {We discuss three novel insights about dropout for DNNs with ReLUs: 1) dropout encourages each local linear piece of a DNN to be trained on data points from nearby regions; 2) the same dropout rate results in different (effective) deactivation rates for layers with different portions of ReLU-deactivated neurons; and 3) the rescaling factor of dropout causes a normalization inconsistency between training and test when used together with batch normalization. The above leads to three simple but nontrivial modifications resulting in our method “jumpout.” Jumpout samples the dropout rate from a monotone decreasing distribution (e.g., the right half of a Gaussian), so each local linear piece is trained, with high probability, to work better for data points from nearby than more distant regions. Jumpout moreover adaptively normalizes the dropout rate at each layer and every training batch, so the effective deactivation rate on the activated neurons is kept the same. Furthermore, it rescales the outputs for a better trade-off that keeps both the variance and mean of neurons more consistent between training and test phases, thereby mitigating the incompatibility between dropout and batch normalization. Jumpout significantly improves the performance of different neural nets on CIFAR10, CIFAR100, Fashion-MNIST, STL10, SVHN, ImageNet-1k, etc., while introducing negligible additional memory and computation costs.} }
Endnote
%0 Conference Paper %T Jumpout : Improved Dropout for Deep Neural Networks with ReLUs %A Shengjie Wang %A Tianyi Zhou %A Jeff Bilmes %B Proceedings of the 36th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Kamalika Chaudhuri %E Ruslan Salakhutdinov %F pmlr-v97-wang19q %I PMLR %P 6668--6676 %U https://proceedings.mlr.press/v97/wang19q.html %V 97 %X We discuss three novel insights about dropout for DNNs with ReLUs: 1) dropout encourages each local linear piece of a DNN to be trained on data points from nearby regions; 2) the same dropout rate results in different (effective) deactivation rates for layers with different portions of ReLU-deactivated neurons; and 3) the rescaling factor of dropout causes a normalization inconsistency between training and test when used together with batch normalization. The above leads to three simple but nontrivial modifications resulting in our method “jumpout.” Jumpout samples the dropout rate from a monotone decreasing distribution (e.g., the right half of a Gaussian), so each local linear piece is trained, with high probability, to work better for data points from nearby than more distant regions. Jumpout moreover adaptively normalizes the dropout rate at each layer and every training batch, so the effective deactivation rate on the activated neurons is kept the same. Furthermore, it rescales the outputs for a better trade-off that keeps both the variance and mean of neurons more consistent between training and test phases, thereby mitigating the incompatibility between dropout and batch normalization. Jumpout significantly improves the performance of different neural nets on CIFAR10, CIFAR100, Fashion-MNIST, STL10, SVHN, ImageNet-1k, etc., while introducing negligible additional memory and computation costs.
APA
Wang, S., Zhou, T. & Bilmes, J.. (2019). Jumpout : Improved Dropout for Deep Neural Networks with ReLUs. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:6668-6676 Available from https://proceedings.mlr.press/v97/wang19q.html.

Related Material