Revisiting Structured Dropout

Yiren Zhao, Oluwatomisin Dada, Robert Mullins, Xitong Gao
Proceedings of the 15th Asian Conference on Machine Learning, PMLR 222:1699-1714, 2024.

Abstract

Large neural networks are often overparameterised and prone to overfitting, Dropout is a widely used regularization technique to combat overfitting and improve model generalization. However, unstructured Dropout is not always effective for specific network architectures and this has led to the formation of multiple structured Dropout approaches to improve model performance and, sometimes, reduce the computational resources required for inference. In this work, we revisit structured Dropout comparing different Dropout approaches on natural language processing and computer vision tasks for multiple state-of-the-art networks. Additionally, we devise an approach to structured Dropout we call \textbf{\emph{ProbDropBlock}} which drops contiguous blocks from feature maps with a probability given by the normalized feature salience values. We find that, with a simple scheduling strategy, the proposed approach to structured Dropout consistently improves model performance compared to baselines and other Dropout approaches on a diverse range of tasks and models. In particular, we show \textbf{\emph{ProbDropBlock}} improves RoBERTa finetuning on MNLI by $0.22%$, and training of ResNet50 on ImageNet by $0.28%$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v222-zhao24a, title = {Revisiting Structured Dropout}, author = {Zhao, Yiren and Dada, Oluwatomisin and Mullins, Robert and Gao, Xitong}, booktitle = {Proceedings of the 15th Asian Conference on Machine Learning}, pages = {1699--1714}, year = {2024}, editor = {Yanıkoğlu, Berrin and Buntine, Wray}, volume = {222}, series = {Proceedings of Machine Learning Research}, month = {11--14 Nov}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v222/zhao24a/zhao24a.pdf}, url = {https://proceedings.mlr.press/v222/zhao24a.html}, abstract = {Large neural networks are often overparameterised and prone to overfitting, Dropout is a widely used regularization technique to combat overfitting and improve model generalization. However, unstructured Dropout is not always effective for specific network architectures and this has led to the formation of multiple structured Dropout approaches to improve model performance and, sometimes, reduce the computational resources required for inference. In this work, we revisit structured Dropout comparing different Dropout approaches on natural language processing and computer vision tasks for multiple state-of-the-art networks. Additionally, we devise an approach to structured Dropout we call \textbf{\emph{ProbDropBlock}} which drops contiguous blocks from feature maps with a probability given by the normalized feature salience values. We find that, with a simple scheduling strategy, the proposed approach to structured Dropout consistently improves model performance compared to baselines and other Dropout approaches on a diverse range of tasks and models. In particular, we show \textbf{\emph{ProbDropBlock}} improves RoBERTa finetuning on MNLI by $0.22%$, and training of ResNet50 on ImageNet by $0.28%$.} }
Endnote
%0 Conference Paper %T Revisiting Structured Dropout %A Yiren Zhao %A Oluwatomisin Dada %A Robert Mullins %A Xitong Gao %B Proceedings of the 15th Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Berrin Yanıkoğlu %E Wray Buntine %F pmlr-v222-zhao24a %I PMLR %P 1699--1714 %U https://proceedings.mlr.press/v222/zhao24a.html %V 222 %X Large neural networks are often overparameterised and prone to overfitting, Dropout is a widely used regularization technique to combat overfitting and improve model generalization. However, unstructured Dropout is not always effective for specific network architectures and this has led to the formation of multiple structured Dropout approaches to improve model performance and, sometimes, reduce the computational resources required for inference. In this work, we revisit structured Dropout comparing different Dropout approaches on natural language processing and computer vision tasks for multiple state-of-the-art networks. Additionally, we devise an approach to structured Dropout we call \textbf{\emph{ProbDropBlock}} which drops contiguous blocks from feature maps with a probability given by the normalized feature salience values. We find that, with a simple scheduling strategy, the proposed approach to structured Dropout consistently improves model performance compared to baselines and other Dropout approaches on a diverse range of tasks and models. In particular, we show \textbf{\emph{ProbDropBlock}} improves RoBERTa finetuning on MNLI by $0.22%$, and training of ResNet50 on ImageNet by $0.28%$.
APA
Zhao, Y., Dada, O., Mullins, R. & Gao, X.. (2024). Revisiting Structured Dropout. Proceedings of the 15th Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 222:1699-1714 Available from https://proceedings.mlr.press/v222/zhao24a.html.

Related Material