How to use dropout correctly on residual networks with batch normalization

Bum Jun Kim, Hyeyeon Choi, Hyeonah Jang, Donggeon Lee, Sang Woo Kim
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:1058-1067, 2023.

Abstract

For the stable optimization of deep neural networks, regularization methods such as dropout and batch normalization have been used in various tasks. Nevertheless, the correct position to apply dropout has rarely been discussed, and different positions have been employed depending on the practitioners. In this study, we investigate the correct position to apply dropout. We demonstrate that for a residual network with batch normalization, applying dropout at certain positions increases the performance, whereas applying dropout at other positions decreases the performance. Based on theoretical analysis, we provide the following guideline for the correct position to apply dropout: apply one dropout after the last batch normalization but before the last weight layer in the residual branch. We provide detailed theoretical explanations to support this claim and demonstrate them through module tests. In addition, we investigate the correct position of dropout in the head that produces the final prediction. Although the current consensus is to apply dropout after global average pooling, we prove that applying dropout before global average pooling leads to a more stable output. The proposed guidelines are validated through experiments using different datasets and models.

Cite this Paper


BibTeX
@InProceedings{pmlr-v216-kim23a, title = {How to use dropout correctly on residual networks with batch normalization}, author = {Kim, Bum Jun and Choi, Hyeyeon and Jang, Hyeonah and Lee, Donggeon and Kim, Sang Woo}, booktitle = {Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence}, pages = {1058--1067}, year = {2023}, editor = {Evans, Robin J. and Shpitser, Ilya}, volume = {216}, series = {Proceedings of Machine Learning Research}, month = {31 Jul--04 Aug}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v216/kim23a/kim23a.pdf}, url = {https://proceedings.mlr.press/v216/kim23a.html}, abstract = {For the stable optimization of deep neural networks, regularization methods such as dropout and batch normalization have been used in various tasks. Nevertheless, the correct position to apply dropout has rarely been discussed, and different positions have been employed depending on the practitioners. In this study, we investigate the correct position to apply dropout. We demonstrate that for a residual network with batch normalization, applying dropout at certain positions increases the performance, whereas applying dropout at other positions decreases the performance. Based on theoretical analysis, we provide the following guideline for the correct position to apply dropout: apply one dropout after the last batch normalization but before the last weight layer in the residual branch. We provide detailed theoretical explanations to support this claim and demonstrate them through module tests. In addition, we investigate the correct position of dropout in the head that produces the final prediction. Although the current consensus is to apply dropout after global average pooling, we prove that applying dropout before global average pooling leads to a more stable output. The proposed guidelines are validated through experiments using different datasets and models.} }
Endnote
%0 Conference Paper %T How to use dropout correctly on residual networks with batch normalization %A Bum Jun Kim %A Hyeyeon Choi %A Hyeonah Jang %A Donggeon Lee %A Sang Woo Kim %B Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2023 %E Robin J. Evans %E Ilya Shpitser %F pmlr-v216-kim23a %I PMLR %P 1058--1067 %U https://proceedings.mlr.press/v216/kim23a.html %V 216 %X For the stable optimization of deep neural networks, regularization methods such as dropout and batch normalization have been used in various tasks. Nevertheless, the correct position to apply dropout has rarely been discussed, and different positions have been employed depending on the practitioners. In this study, we investigate the correct position to apply dropout. We demonstrate that for a residual network with batch normalization, applying dropout at certain positions increases the performance, whereas applying dropout at other positions decreases the performance. Based on theoretical analysis, we provide the following guideline for the correct position to apply dropout: apply one dropout after the last batch normalization but before the last weight layer in the residual branch. We provide detailed theoretical explanations to support this claim and demonstrate them through module tests. In addition, we investigate the correct position of dropout in the head that produces the final prediction. Although the current consensus is to apply dropout after global average pooling, we prove that applying dropout before global average pooling leads to a more stable output. The proposed guidelines are validated through experiments using different datasets and models.
APA
Kim, B.J., Choi, H., Jang, H., Lee, D. & Kim, S.W.. (2023). How to use dropout correctly on residual networks with batch normalization. Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 216:1058-1067 Available from https://proceedings.mlr.press/v216/kim23a.html.

Related Material