[edit]
BaWA: Automatic Optimizing Pruning Metric for Large Language Models with Balanced Weight and Activation
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:40134-40151, 2025.
Abstract
One-shot post-training pruning enhances the deployment of billion-scale large language models (LLMs), with the pruning metric playing a pivotal role in determining which weights to remove. However, existing metrics underperform due to their reliance on a simple symbolic combination of weights and activations, overlooking imbalanced weight magnitudes and the disproportionate influence of activation outliers. To overcome these limitations, we introduce BaWA, a novel pruning metric that systematically Balances Weight and Activation distributions for more effective pruning. BaWA introduces two key innovations: magnitude normalization, which mitigates weight imbalance across channels for fairer pruning decisions, and outlier regularization, which reduces the impact of activation outliers, ensuring more appropriate channel prioritization. To further enhance its effectiveness, BaWA incorporates an efficient and automatic framework for optimizing normalization and regularization hyperparameters. Extensive experiments validate BaWA as a state-of-the-art (SOTA) pruning metric. For instance, applying BaWA to induce 2:4 sparsity in Mistral-7B reduces perplexity in language comprehension by 2.49 and improves average downstream task accuracy by 3.08%, outperforming the previous SOTA method Wanda.