Lottery Ticket Preserves Weight Correlation: Is It Desirable or Not?

Ning Liu, Geng Yuan, Zhengping Che, Xuan Shen, Xiaolong Ma, Qing Jin, Jian Ren, Jian Tang, Sijia Liu, Yanzhi Wang
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:7011-7020, 2021.

Abstract

In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) pointed out that there could exist a winning ticket (i.e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance than the original dense network. However, it is not easy to observe such winning property in many scenarios, where for example, a relatively large learning rate is used even if it benefits training the original dense model. In this work, we investigate the underlying condition and rationale behind the winning property, and find that the underlying reason is largely attributed to the correlation between initialized weights and final-trained weights when the learning rate is not sufficiently large. Thus, the existence of winning property is correlated with an insufficient DNN pretraining, and is unlikely to occur for a well-trained DNN. To overcome this limitation, we propose the "pruning & fine-tuning" method that consistently outperforms lottery ticket sparse training under the same pruning algorithm and the same total training epochs. Extensive experiments over multiple deep models (VGG, ResNet, MobileNet-v2) on different datasets have been conducted to justify our proposals.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-liu21aa, title = {Lottery Ticket Preserves Weight Correlation: Is It Desirable or Not?}, author = {Liu, Ning and Yuan, Geng and Che, Zhengping and Shen, Xuan and Ma, Xiaolong and Jin, Qing and Ren, Jian and Tang, Jian and Liu, Sijia and Wang, Yanzhi}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {7011--7020}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/liu21aa/liu21aa.pdf}, url = {http://proceedings.mlr.press/v139/liu21aa.html}, abstract = {In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) pointed out that there could exist a winning ticket (i.e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance than the original dense network. However, it is not easy to observe such winning property in many scenarios, where for example, a relatively large learning rate is used even if it benefits training the original dense model. In this work, we investigate the underlying condition and rationale behind the winning property, and find that the underlying reason is largely attributed to the correlation between initialized weights and final-trained weights when the learning rate is not sufficiently large. Thus, the existence of winning property is correlated with an insufficient DNN pretraining, and is unlikely to occur for a well-trained DNN. To overcome this limitation, we propose the "pruning & fine-tuning" method that consistently outperforms lottery ticket sparse training under the same pruning algorithm and the same total training epochs. Extensive experiments over multiple deep models (VGG, ResNet, MobileNet-v2) on different datasets have been conducted to justify our proposals.} }
Endnote
%0 Conference Paper %T Lottery Ticket Preserves Weight Correlation: Is It Desirable or Not? %A Ning Liu %A Geng Yuan %A Zhengping Che %A Xuan Shen %A Xiaolong Ma %A Qing Jin %A Jian Ren %A Jian Tang %A Sijia Liu %A Yanzhi Wang %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-liu21aa %I PMLR %P 7011--7020 %U http://proceedings.mlr.press/v139/liu21aa.html %V 139 %X In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) pointed out that there could exist a winning ticket (i.e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance than the original dense network. However, it is not easy to observe such winning property in many scenarios, where for example, a relatively large learning rate is used even if it benefits training the original dense model. In this work, we investigate the underlying condition and rationale behind the winning property, and find that the underlying reason is largely attributed to the correlation between initialized weights and final-trained weights when the learning rate is not sufficiently large. Thus, the existence of winning property is correlated with an insufficient DNN pretraining, and is unlikely to occur for a well-trained DNN. To overcome this limitation, we propose the "pruning & fine-tuning" method that consistently outperforms lottery ticket sparse training under the same pruning algorithm and the same total training epochs. Extensive experiments over multiple deep models (VGG, ResNet, MobileNet-v2) on different datasets have been conducted to justify our proposals.
APA
Liu, N., Yuan, G., Che, Z., Shen, X., Ma, X., Jin, Q., Ren, J., Tang, J., Liu, S. & Wang, Y.. (2021). Lottery Ticket Preserves Weight Correlation: Is It Desirable or Not?. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:7011-7020 Available from http://proceedings.mlr.press/v139/liu21aa.html.

Related Material