Proceedings of the 36th International Conference on Machine Learning, PMLR 97:2112-2121, 2019.
Dropout is a popular technique to train large-scale deep neural networks to alleviate the overfitting problem. To disclose the underlying reasons for its gain, numerous works have tried to explain it from different perspectives. In this paper, unlike existing works, we explore it from a new perspective to provide new insight into this line of research. In detail, we disentangle the forward and backward pass of dropout. Then, we find that these two passes need different levels of noise to improve the generalization performance of deep neural networks. Based on this observation, we propose the augmented dropout which employs different dropping strategies in the forward and backward pass. Experimental results have verified the effectiveness of our proposed method.