Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

Shiwei Liu, Lu Yin, Decebal Constantin Mocanu, Mykola Pechenizkiy
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:6989-7000, 2021.

Abstract

In this paper, we introduce a new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization by proposing the concept of In-Time Over-Parameterization (ITOP) in sparse training. By starting from a random sparse network and continuously exploring sparse connectivities during training, we can perform an Over-Parameterization over the course of training, closing the gap in the expressibility between sparse training and dense training. We further use ITOP to understand the underlying mechanism of Dynamic Sparse Training (DST) and discover that the benefits of DST come from its ability to consider across time all possible parameters when searching for the optimal sparse connectivity. As long as sufficient parameters have been reliably explored, DST can outperform the dense neural network by a large margin. We present a series of experiments to support our conjecture and achieve the state-of-the-art sparse training performance with ResNet-50 on ImageNet. More impressively, ITOP achieves dominant performance over the overparameterization-based sparse methods at extreme sparsities. When trained with ResNet-34 on CIFAR-100, ITOP can match the performance of the dense model at an extreme sparsity 98%.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-liu21y, title = {Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training}, author = {Liu, Shiwei and Yin, Lu and Mocanu, Decebal Constantin and Pechenizkiy, Mykola}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {6989--7000}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/liu21y/liu21y.pdf}, url = {https://proceedings.mlr.press/v139/liu21y.html}, abstract = {In this paper, we introduce a new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization by proposing the concept of In-Time Over-Parameterization (ITOP) in sparse training. By starting from a random sparse network and continuously exploring sparse connectivities during training, we can perform an Over-Parameterization over the course of training, closing the gap in the expressibility between sparse training and dense training. We further use ITOP to understand the underlying mechanism of Dynamic Sparse Training (DST) and discover that the benefits of DST come from its ability to consider across time all possible parameters when searching for the optimal sparse connectivity. As long as sufficient parameters have been reliably explored, DST can outperform the dense neural network by a large margin. We present a series of experiments to support our conjecture and achieve the state-of-the-art sparse training performance with ResNet-50 on ImageNet. More impressively, ITOP achieves dominant performance over the overparameterization-based sparse methods at extreme sparsities. When trained with ResNet-34 on CIFAR-100, ITOP can match the performance of the dense model at an extreme sparsity 98%.} }
Endnote
%0 Conference Paper %T Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training %A Shiwei Liu %A Lu Yin %A Decebal Constantin Mocanu %A Mykola Pechenizkiy %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-liu21y %I PMLR %P 6989--7000 %U https://proceedings.mlr.press/v139/liu21y.html %V 139 %X In this paper, we introduce a new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization by proposing the concept of In-Time Over-Parameterization (ITOP) in sparse training. By starting from a random sparse network and continuously exploring sparse connectivities during training, we can perform an Over-Parameterization over the course of training, closing the gap in the expressibility between sparse training and dense training. We further use ITOP to understand the underlying mechanism of Dynamic Sparse Training (DST) and discover that the benefits of DST come from its ability to consider across time all possible parameters when searching for the optimal sparse connectivity. As long as sufficient parameters have been reliably explored, DST can outperform the dense neural network by a large margin. We present a series of experiments to support our conjecture and achieve the state-of-the-art sparse training performance with ResNet-50 on ImageNet. More impressively, ITOP achieves dominant performance over the overparameterization-based sparse methods at extreme sparsities. When trained with ResNet-34 on CIFAR-100, ITOP can match the performance of the dense model at an extreme sparsity 98%.
APA
Liu, S., Yin, L., Mocanu, D.C. & Pechenizkiy, M.. (2021). Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:6989-7000 Available from https://proceedings.mlr.press/v139/liu21y.html.

Related Material