When Does Re-initialization Work?

Sheheryar Zaidi, Tudor Berariu, Hyunjik Kim, Jorg Bornschein, Claudia Clopath, Yee Whye Teh, Razvan Pascanu
Proceedings on "I Can't Believe It's Not Better! - Understanding Deep Learning Through Empirical Falsification" at NeurIPS 2022 Workshops, PMLR 187:12-26, 2023.

Abstract

Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques such as data augmentation, weight decay and learning rate schedules. In this work, we conduct an extensive empirical comparison of standard training with a selection of re-initialization methods to answer this question, training over 15,000 models on a variety of image classification benchmarks. We first establish that such methods are consistently beneficial for generalization in the absence of any other regularization. However, when deployed alongside other carefully tuned regularization techniques, re-initialization methods offer little to no added benefit for generalization, although optimal generalization performance becomes less sensitive to the choice of learning rate and weight decay hyperparameters. To investigate the impact of re-initialization methods on noisy data, we also consider learning under label noise. Surprisingly, in this case, re-initialization significantly improves upon standard training, even in the presence of other carefully tuned regularization techniques.

Cite this Paper


BibTeX
@InProceedings{pmlr-v187-zaidi23a, title = {When Does Re-initialization Work?}, author = {Zaidi, Sheheryar and Berariu, Tudor and Kim, Hyunjik and Bornschein, Jorg and Clopath, Claudia and Teh, Yee Whye and Pascanu, Razvan}, booktitle = {Proceedings on "I Can't Believe It's Not Better! - Understanding Deep Learning Through Empirical Falsification" at NeurIPS 2022 Workshops}, pages = {12--26}, year = {2023}, editor = {AntorĂ¡n, Javier and Blaas, Arno and Feng, Fan and Ghalebikesabi, Sahra and Mason, Ian and Pradier, Melanie F. and Rohde, David and Ruiz, Francisco J. R. and Schein, Aaron}, volume = {187}, series = {Proceedings of Machine Learning Research}, month = {03 Dec}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v187/zaidi23a/zaidi23a.pdf}, url = {https://proceedings.mlr.press/v187/zaidi23a.html}, abstract = {Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques such as data augmentation, weight decay and learning rate schedules. In this work, we conduct an extensive empirical comparison of standard training with a selection of re-initialization methods to answer this question, training over 15,000 models on a variety of image classification benchmarks. We first establish that such methods are consistently beneficial for generalization in the absence of any other regularization. However, when deployed alongside other carefully tuned regularization techniques, re-initialization methods offer little to no added benefit for generalization, although optimal generalization performance becomes less sensitive to the choice of learning rate and weight decay hyperparameters. To investigate the impact of re-initialization methods on noisy data, we also consider learning under label noise. Surprisingly, in this case, re-initialization significantly improves upon standard training, even in the presence of other carefully tuned regularization techniques.} }
Endnote
%0 Conference Paper %T When Does Re-initialization Work? %A Sheheryar Zaidi %A Tudor Berariu %A Hyunjik Kim %A Jorg Bornschein %A Claudia Clopath %A Yee Whye Teh %A Razvan Pascanu %B Proceedings on "I Can't Believe It's Not Better! - Understanding Deep Learning Through Empirical Falsification" at NeurIPS 2022 Workshops %C Proceedings of Machine Learning Research %D 2023 %E Javier AntorĂ¡n %E Arno Blaas %E Fan Feng %E Sahra Ghalebikesabi %E Ian Mason %E Melanie F. Pradier %E David Rohde %E Francisco J. R. Ruiz %E Aaron Schein %F pmlr-v187-zaidi23a %I PMLR %P 12--26 %U https://proceedings.mlr.press/v187/zaidi23a.html %V 187 %X Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques such as data augmentation, weight decay and learning rate schedules. In this work, we conduct an extensive empirical comparison of standard training with a selection of re-initialization methods to answer this question, training over 15,000 models on a variety of image classification benchmarks. We first establish that such methods are consistently beneficial for generalization in the absence of any other regularization. However, when deployed alongside other carefully tuned regularization techniques, re-initialization methods offer little to no added benefit for generalization, although optimal generalization performance becomes less sensitive to the choice of learning rate and weight decay hyperparameters. To investigate the impact of re-initialization methods on noisy data, we also consider learning under label noise. Surprisingly, in this case, re-initialization significantly improves upon standard training, even in the presence of other carefully tuned regularization techniques.
APA
Zaidi, S., Berariu, T., Kim, H., Bornschein, J., Clopath, C., Teh, Y.W. & Pascanu, R.. (2023). When Does Re-initialization Work?. Proceedings on "I Can't Believe It's Not Better! - Understanding Deep Learning Through Empirical Falsification" at NeurIPS 2022 Workshops, in Proceedings of Machine Learning Research 187:12-26 Available from https://proceedings.mlr.press/v187/zaidi23a.html.

Related Material