Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset

Ilan Price, Jared Tanner
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:8620-8629, 2021.

Abstract

That neural networks may be pruned to high sparsities and retain high accuracy is well established. Recent research efforts focus on pruning immediately after initialization so as to allow the computational savings afforded by sparsity to extend to the training process. In this work, we introduce a new ‘DCT plus Sparse’ layer architecture, which maintains information propagation and trainability even with as little as 0.01% trainable parameters remaining. We show that standard training of networks built with these layers, and pruned at initialization, achieves state-of-the-art accuracy for extreme sparsities on a variety of benchmark network architectures and datasets. Moreover, these results are achieved using only simple heuristics to determine the locations of the trainable parameters in the network, and thus without having to initially store or compute with the full, unpruned network, as is required by competing prune-at-initialization algorithms. Switching from standard sparse layers to DCT plus Sparse layers does not increase the storage footprint of a network and incurs only a small additional computational overhead.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-price21a, title = {Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset}, author = {Price, Ilan and Tanner, Jared}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {8620--8629}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/price21a/price21a.pdf}, url = {https://proceedings.mlr.press/v139/price21a.html}, abstract = {That neural networks may be pruned to high sparsities and retain high accuracy is well established. Recent research efforts focus on pruning immediately after initialization so as to allow the computational savings afforded by sparsity to extend to the training process. In this work, we introduce a new ‘DCT plus Sparse’ layer architecture, which maintains information propagation and trainability even with as little as 0.01% trainable parameters remaining. We show that standard training of networks built with these layers, and pruned at initialization, achieves state-of-the-art accuracy for extreme sparsities on a variety of benchmark network architectures and datasets. Moreover, these results are achieved using only simple heuristics to determine the locations of the trainable parameters in the network, and thus without having to initially store or compute with the full, unpruned network, as is required by competing prune-at-initialization algorithms. Switching from standard sparse layers to DCT plus Sparse layers does not increase the storage footprint of a network and incurs only a small additional computational overhead.} }
Endnote
%0 Conference Paper %T Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset %A Ilan Price %A Jared Tanner %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-price21a %I PMLR %P 8620--8629 %U https://proceedings.mlr.press/v139/price21a.html %V 139 %X That neural networks may be pruned to high sparsities and retain high accuracy is well established. Recent research efforts focus on pruning immediately after initialization so as to allow the computational savings afforded by sparsity to extend to the training process. In this work, we introduce a new ‘DCT plus Sparse’ layer architecture, which maintains information propagation and trainability even with as little as 0.01% trainable parameters remaining. We show that standard training of networks built with these layers, and pruned at initialization, achieves state-of-the-art accuracy for extreme sparsities on a variety of benchmark network architectures and datasets. Moreover, these results are achieved using only simple heuristics to determine the locations of the trainable parameters in the network, and thus without having to initially store or compute with the full, unpruned network, as is required by competing prune-at-initialization algorithms. Switching from standard sparse layers to DCT plus Sparse layers does not increase the storage footprint of a network and incurs only a small additional computational overhead.
APA
Price, I. & Tanner, J.. (2021). Dense for the Price of Sparse: Improved Performance of Sparsely Initialized Networks via a Subspace Offset. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:8620-8629 Available from https://proceedings.mlr.press/v139/price21a.html.

Related Material