Sparsest Models Elude Pruning: An Exposé of Pruning’s Current Capabilities

Stephen Zhang, Vardan Papyan
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:59576-59600, 2024.

Abstract

Pruning has emerged as a promising approach for compressing large-scale models, yet its effectiveness in recovering the sparsest of models has not yet been explored. We conducted an extensive series of 485,838 experiments, applying a range of state-of-the-art pruning algorithms to a synthetic dataset we created, named the Cubist Spiral. Our findings reveal a significant gap in performance compared to ideal sparse networks, which we identified through a novel combinatorial search algorithm. We attribute this performance gap to current pruning algorithms’ poor behaviour under overparameterization, their tendency to induce disconnected paths throughout the network, and their propensity to get stuck at suboptimal solutions, even when given the optimal width and initialization. This gap is concerning, given the simplicity of the network architectures and datasets used in our study. We hope that our research encourages further investigation into new pruning techniques that strive for true network sparsity.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-zhang24av, title = {Sparsest Models Elude Pruning: An Exposé of Pruning’s Current Capabilities}, author = {Zhang, Stephen and Papyan, Vardan}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {59576--59600}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/zhang24av/zhang24av.pdf}, url = {https://proceedings.mlr.press/v235/zhang24av.html}, abstract = {Pruning has emerged as a promising approach for compressing large-scale models, yet its effectiveness in recovering the sparsest of models has not yet been explored. We conducted an extensive series of 485,838 experiments, applying a range of state-of-the-art pruning algorithms to a synthetic dataset we created, named the Cubist Spiral. Our findings reveal a significant gap in performance compared to ideal sparse networks, which we identified through a novel combinatorial search algorithm. We attribute this performance gap to current pruning algorithms’ poor behaviour under overparameterization, their tendency to induce disconnected paths throughout the network, and their propensity to get stuck at suboptimal solutions, even when given the optimal width and initialization. This gap is concerning, given the simplicity of the network architectures and datasets used in our study. We hope that our research encourages further investigation into new pruning techniques that strive for true network sparsity.} }
Endnote
%0 Conference Paper %T Sparsest Models Elude Pruning: An Exposé of Pruning’s Current Capabilities %A Stephen Zhang %A Vardan Papyan %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-zhang24av %I PMLR %P 59576--59600 %U https://proceedings.mlr.press/v235/zhang24av.html %V 235 %X Pruning has emerged as a promising approach for compressing large-scale models, yet its effectiveness in recovering the sparsest of models has not yet been explored. We conducted an extensive series of 485,838 experiments, applying a range of state-of-the-art pruning algorithms to a synthetic dataset we created, named the Cubist Spiral. Our findings reveal a significant gap in performance compared to ideal sparse networks, which we identified through a novel combinatorial search algorithm. We attribute this performance gap to current pruning algorithms’ poor behaviour under overparameterization, their tendency to induce disconnected paths throughout the network, and their propensity to get stuck at suboptimal solutions, even when given the optimal width and initialization. This gap is concerning, given the simplicity of the network architectures and datasets used in our study. We hope that our research encourages further investigation into new pruning techniques that strive for true network sparsity.
APA
Zhang, S. & Papyan, V.. (2024). Sparsest Models Elude Pruning: An Exposé of Pruning’s Current Capabilities. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:59576-59600 Available from https://proceedings.mlr.press/v235/zhang24av.html.

Related Material