BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training

Chenyi Yang, Wenjie Nie, Yuxin Zhang, Yuhang Wu, Xiawu Zheng, Guannan Jiang, Rongrong Ji
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:70943-70952, 2025.

Abstract

N:M sparsity stands as a progressively important tool for DNN compression, achieving practical speedups by stipulating at most N non-zero components within M sequential weights. Unfortunately, most existing works identify the N:M sparse mask through dense backward propagation to update all weights, which incurs exorbitant training costs. In this paper, we introduce BAME, a method that maintains consistent sparsity throughout the N:M sparse training process. BAME perpetually keeps both sparse forward and backward propagation, while iteratively performing weight pruning-and-regrowing within designated weight blocks to tailor the N:M mask. These blocks are selected through a joint assessment based on accumulated mask oscillation frequency and expected loss reduction of mask adaptation, thereby ensuring stable and efficient identification of the optimal N:M mask. Our empirical results substantiate the effectiveness of BAME, illustrating it performs comparably to or better than previous works that fully maintaining dense backward propagation during training. For instance, BAME attains a 72.0% top-1 accuracy while training a 1:16 sparse ResNet-50 on ImageNet, eclipsing SR-STE by 0.5%, despite achieving 2.37 training FLOPs reduction. Code is released at https://github.com/BAME-xmu/BAME

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-yang25v, title = {{BAME}: Block-Aware Mask Evolution for Efficient {N}:{M} Sparse Training}, author = {Yang, Chenyi and Nie, Wenjie and Zhang, Yuxin and Wu, Yuhang and Zheng, Xiawu and Jiang, Guannan and Ji, Rongrong}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {70943--70952}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/yang25v/yang25v.pdf}, url = {https://proceedings.mlr.press/v267/yang25v.html}, abstract = {N:M sparsity stands as a progressively important tool for DNN compression, achieving practical speedups by stipulating at most N non-zero components within M sequential weights. Unfortunately, most existing works identify the N:M sparse mask through dense backward propagation to update all weights, which incurs exorbitant training costs. In this paper, we introduce BAME, a method that maintains consistent sparsity throughout the N:M sparse training process. BAME perpetually keeps both sparse forward and backward propagation, while iteratively performing weight pruning-and-regrowing within designated weight blocks to tailor the N:M mask. These blocks are selected through a joint assessment based on accumulated mask oscillation frequency and expected loss reduction of mask adaptation, thereby ensuring stable and efficient identification of the optimal N:M mask. Our empirical results substantiate the effectiveness of BAME, illustrating it performs comparably to or better than previous works that fully maintaining dense backward propagation during training. For instance, BAME attains a 72.0% top-1 accuracy while training a 1:16 sparse ResNet-50 on ImageNet, eclipsing SR-STE by 0.5%, despite achieving 2.37 training FLOPs reduction. Code is released at https://github.com/BAME-xmu/BAME} }
Endnote
%0 Conference Paper %T BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training %A Chenyi Yang %A Wenjie Nie %A Yuxin Zhang %A Yuhang Wu %A Xiawu Zheng %A Guannan Jiang %A Rongrong Ji %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-yang25v %I PMLR %P 70943--70952 %U https://proceedings.mlr.press/v267/yang25v.html %V 267 %X N:M sparsity stands as a progressively important tool for DNN compression, achieving practical speedups by stipulating at most N non-zero components within M sequential weights. Unfortunately, most existing works identify the N:M sparse mask through dense backward propagation to update all weights, which incurs exorbitant training costs. In this paper, we introduce BAME, a method that maintains consistent sparsity throughout the N:M sparse training process. BAME perpetually keeps both sparse forward and backward propagation, while iteratively performing weight pruning-and-regrowing within designated weight blocks to tailor the N:M mask. These blocks are selected through a joint assessment based on accumulated mask oscillation frequency and expected loss reduction of mask adaptation, thereby ensuring stable and efficient identification of the optimal N:M mask. Our empirical results substantiate the effectiveness of BAME, illustrating it performs comparably to or better than previous works that fully maintaining dense backward propagation during training. For instance, BAME attains a 72.0% top-1 accuracy while training a 1:16 sparse ResNet-50 on ImageNet, eclipsing SR-STE by 0.5%, despite achieving 2.37 training FLOPs reduction. Code is released at https://github.com/BAME-xmu/BAME
APA
Yang, C., Nie, W., Zhang, Y., Wu, Y., Zheng, X., Jiang, G. & Ji, R.. (2025). BAME: Block-Aware Mask Evolution for Efficient N:M Sparse Training. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:70943-70952 Available from https://proceedings.mlr.press/v267/yang25v.html.

Related Material