BaWA: Automatic Optimizing Pruning Metric for Large Language Models with Balanced Weight and Activation

Lian Liu, Xiandong Zhao, Guanchen Li, Dong Li, Mengdi Wang, Yinhe Han, Xiaowei Li, Ying Wang
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:40134-40151, 2025.

Abstract

One-shot post-training pruning enhances the deployment of billion-scale large language models (LLMs), with the pruning metric playing a pivotal role in determining which weights to remove. However, existing metrics underperform due to their reliance on a simple symbolic combination of weights and activations, overlooking imbalanced weight magnitudes and the disproportionate influence of activation outliers. To overcome these limitations, we introduce BaWA, a novel pruning metric that systematically Balances Weight and Activation distributions for more effective pruning. BaWA introduces two key innovations: magnitude normalization, which mitigates weight imbalance across channels for fairer pruning decisions, and outlier regularization, which reduces the impact of activation outliers, ensuring more appropriate channel prioritization. To further enhance its effectiveness, BaWA incorporates an efficient and automatic framework for optimizing normalization and regularization hyperparameters. Extensive experiments validate BaWA as a state-of-the-art (SOTA) pruning metric. For instance, applying BaWA to induce 2:4 sparsity in Mistral-7B reduces perplexity in language comprehension by 2.49 and improves average downstream task accuracy by 3.08%, outperforming the previous SOTA method Wanda.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-liu25cs, title = {{B}a{WA}: Automatic Optimizing Pruning Metric for Large Language Models with Balanced Weight and Activation}, author = {Liu, Lian and Zhao, Xiandong and Li, Guanchen and Li, Dong and Wang, Mengdi and Han, Yinhe and Li, Xiaowei and Wang, Ying}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {40134--40151}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/liu25cs/liu25cs.pdf}, url = {https://proceedings.mlr.press/v267/liu25cs.html}, abstract = {One-shot post-training pruning enhances the deployment of billion-scale large language models (LLMs), with the pruning metric playing a pivotal role in determining which weights to remove. However, existing metrics underperform due to their reliance on a simple symbolic combination of weights and activations, overlooking imbalanced weight magnitudes and the disproportionate influence of activation outliers. To overcome these limitations, we introduce BaWA, a novel pruning metric that systematically Balances Weight and Activation distributions for more effective pruning. BaWA introduces two key innovations: magnitude normalization, which mitigates weight imbalance across channels for fairer pruning decisions, and outlier regularization, which reduces the impact of activation outliers, ensuring more appropriate channel prioritization. To further enhance its effectiveness, BaWA incorporates an efficient and automatic framework for optimizing normalization and regularization hyperparameters. Extensive experiments validate BaWA as a state-of-the-art (SOTA) pruning metric. For instance, applying BaWA to induce 2:4 sparsity in Mistral-7B reduces perplexity in language comprehension by 2.49 and improves average downstream task accuracy by 3.08%, outperforming the previous SOTA method Wanda.} }
Endnote
%0 Conference Paper %T BaWA: Automatic Optimizing Pruning Metric for Large Language Models with Balanced Weight and Activation %A Lian Liu %A Xiandong Zhao %A Guanchen Li %A Dong Li %A Mengdi Wang %A Yinhe Han %A Xiaowei Li %A Ying Wang %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-liu25cs %I PMLR %P 40134--40151 %U https://proceedings.mlr.press/v267/liu25cs.html %V 267 %X One-shot post-training pruning enhances the deployment of billion-scale large language models (LLMs), with the pruning metric playing a pivotal role in determining which weights to remove. However, existing metrics underperform due to their reliance on a simple symbolic combination of weights and activations, overlooking imbalanced weight magnitudes and the disproportionate influence of activation outliers. To overcome these limitations, we introduce BaWA, a novel pruning metric that systematically Balances Weight and Activation distributions for more effective pruning. BaWA introduces two key innovations: magnitude normalization, which mitigates weight imbalance across channels for fairer pruning decisions, and outlier regularization, which reduces the impact of activation outliers, ensuring more appropriate channel prioritization. To further enhance its effectiveness, BaWA incorporates an efficient and automatic framework for optimizing normalization and regularization hyperparameters. Extensive experiments validate BaWA as a state-of-the-art (SOTA) pruning metric. For instance, applying BaWA to induce 2:4 sparsity in Mistral-7B reduces perplexity in language comprehension by 2.49 and improves average downstream task accuracy by 3.08%, outperforming the previous SOTA method Wanda.
APA
Liu, L., Zhao, X., Li, G., Li, D., Wang, M., Han, Y., Li, X. & Wang, Y.. (2025). BaWA: Automatic Optimizing Pruning Metric for Large Language Models with Balanced Weight and Activation. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:40134-40151 Available from https://proceedings.mlr.press/v267/liu25cs.html.

Related Material