Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once

Zhangheng Li, Shiwei Liu, Tianlong Chen, Ajay Kumar Jaiswal, Zhenyu Zhang, Dilin Wang, Raghuraman Krishnamoorthi, Shiyu Chang, Zhangyang Wang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:28368-28386, 2024.

Abstract

Sparse Neural Networks (SNNs) have received voluminous attention for mitigating the explosion in computational costs and memory footprints of modern deep neural networks. Despite their popularity, most state-of-the-art training approaches seek to find a single high-quality sparse subnetwork with a preset sparsity pattern and ratio, making them inadequate to satiate platform and resource variability. Recently proposed approaches attempt to jointly train multiple subnetworks (we term as “sparse co-training") with a fixed sparsity pattern, to allow switching sparsity ratios subject to resource requirements. In this work, we take one more step forward and expand the scope of sparse co-training to cover diverse sparsity patterns and multiple sparsity ratios at once. We introduce Sparse Cocktail, the first sparse co-training framework that co-trains a suite of sparsity patterns simultaneously, loaded with multiple sparsity ratios which facilitate harmonious switch across various sparsity patterns and ratios at inference depending on the hardware availability. More specifically, Sparse Cocktail alternatively trains subnetworks generated from different sparsity patterns with a gradual increase in sparsity ratios across patterns and relies on an unified mask generation process and the Dense Pivot Co-training to ensure the subnetworks of different patterns orchestrate their shared parameters without canceling each other’s performance. Experiment results on image classification, object detection, and instance segmentation illustrate the favorable effectiveness and flexibility of Sparse Cocktail, pointing to a promising direction for sparse co-training. Codes will be released.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-li24av, title = {Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once}, author = {Li, Zhangheng and Liu, Shiwei and Chen, Tianlong and Jaiswal, Ajay Kumar and Zhang, Zhenyu and Wang, Dilin and Krishnamoorthi, Raghuraman and Chang, Shiyu and Wang, Zhangyang}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {28368--28386}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/li24av/li24av.pdf}, url = {https://proceedings.mlr.press/v235/li24av.html}, abstract = {Sparse Neural Networks (SNNs) have received voluminous attention for mitigating the explosion in computational costs and memory footprints of modern deep neural networks. Despite their popularity, most state-of-the-art training approaches seek to find a single high-quality sparse subnetwork with a preset sparsity pattern and ratio, making them inadequate to satiate platform and resource variability. Recently proposed approaches attempt to jointly train multiple subnetworks (we term as “sparse co-training") with a fixed sparsity pattern, to allow switching sparsity ratios subject to resource requirements. In this work, we take one more step forward and expand the scope of sparse co-training to cover diverse sparsity patterns and multiple sparsity ratios at once. We introduce Sparse Cocktail, the first sparse co-training framework that co-trains a suite of sparsity patterns simultaneously, loaded with multiple sparsity ratios which facilitate harmonious switch across various sparsity patterns and ratios at inference depending on the hardware availability. More specifically, Sparse Cocktail alternatively trains subnetworks generated from different sparsity patterns with a gradual increase in sparsity ratios across patterns and relies on an unified mask generation process and the Dense Pivot Co-training to ensure the subnetworks of different patterns orchestrate their shared parameters without canceling each other’s performance. Experiment results on image classification, object detection, and instance segmentation illustrate the favorable effectiveness and flexibility of Sparse Cocktail, pointing to a promising direction for sparse co-training. Codes will be released.} }
Endnote
%0 Conference Paper %T Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once %A Zhangheng Li %A Shiwei Liu %A Tianlong Chen %A Ajay Kumar Jaiswal %A Zhenyu Zhang %A Dilin Wang %A Raghuraman Krishnamoorthi %A Shiyu Chang %A Zhangyang Wang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-li24av %I PMLR %P 28368--28386 %U https://proceedings.mlr.press/v235/li24av.html %V 235 %X Sparse Neural Networks (SNNs) have received voluminous attention for mitigating the explosion in computational costs and memory footprints of modern deep neural networks. Despite their popularity, most state-of-the-art training approaches seek to find a single high-quality sparse subnetwork with a preset sparsity pattern and ratio, making them inadequate to satiate platform and resource variability. Recently proposed approaches attempt to jointly train multiple subnetworks (we term as “sparse co-training") with a fixed sparsity pattern, to allow switching sparsity ratios subject to resource requirements. In this work, we take one more step forward and expand the scope of sparse co-training to cover diverse sparsity patterns and multiple sparsity ratios at once. We introduce Sparse Cocktail, the first sparse co-training framework that co-trains a suite of sparsity patterns simultaneously, loaded with multiple sparsity ratios which facilitate harmonious switch across various sparsity patterns and ratios at inference depending on the hardware availability. More specifically, Sparse Cocktail alternatively trains subnetworks generated from different sparsity patterns with a gradual increase in sparsity ratios across patterns and relies on an unified mask generation process and the Dense Pivot Co-training to ensure the subnetworks of different patterns orchestrate their shared parameters without canceling each other’s performance. Experiment results on image classification, object detection, and instance segmentation illustrate the favorable effectiveness and flexibility of Sparse Cocktail, pointing to a promising direction for sparse co-training. Codes will be released.
APA
Li, Z., Liu, S., Chen, T., Jaiswal, A.K., Zhang, Z., Wang, D., Krishnamoorthi, R., Chang, S. & Wang, Z.. (2024). Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:28368-28386 Available from https://proceedings.mlr.press/v235/li24av.html.

Related Material