ERC-SVD: Error-Controlled SVD for Large Language Model Compression

Haolei Bai; Siyong Jian; Tuo Liang; Yu Yin; Huan Wang

ERC-SVD: Error-Controlled SVD for Large Language Model Compression

Haolei Bai, Siyong Jian, Tuo Liang, Yu Yin, Huan Wang

Conference on Parsimony and Learning, PMLR 328:698-719, 2026.

Abstract

Large language models (LLMs) have demonstrated impressive capabilities in a wide range of downstream natural language processing tasks. Nevertheless, their considerable sizes and memory demands hinder practical deployment, underscoring the importance of developing efficient compression strategies. Singular value decomposition (SVD) decomposes a matrix into orthogonal components, enabling efficient low-rank approximation. This is particularly suitable for LLM compression, where weight matrices often exhibit significant redundancy. However, current SVD-based methods neglect the residual matrix from truncation, resulting in significant truncation loss. Additionally, compressing all layers of the model results in severe error propagation. To overcome these limitations, we propose ERC-SVD, a new post-training SVD-based LLM compression method from an error-controlled perspective. Specifically, we leverage the residual matrix generated during the truncation process to reduce truncation loss. Moreover, under a fixed overall compression ratio, we selectively compress the last few layers of the model, which mitigates error propagation and improves compressed model performance. Comprehensive evaluations on diverse LLM families and multiple benchmark datasets indicate that ERC-SVD consistently achieves superior performance over existing counterpart methods, demonstrating its practical effectiveness.

Cite this Paper

BibTeX

@InProceedings{pmlr-v328-bai26a,
  title = 	 {ERC-SVD: Error-Controlled SVD for Large Language Model Compression},
  author =       {Bai, Haolei and Jian, Siyong and Liang, Tuo and Yin, Yu and Wang, Huan},
  booktitle = 	 {Conference on Parsimony and Learning},
  pages = 	 {698--719},
  year = 	 {2026},
  editor = 	 {Burkholz, Rebekka and Liu, Shiwei and Ravishankar, Saiprasad and Redman, William and Huang, Wei and Su, Weijie and Zhu, Zhihui},
  volume = 	 {328},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--26 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v328/main/assets/bai26a/bai26a.pdf},
  url = 	 {https://proceedings.mlr.press/v328/bai26a.html},
  abstract = 	 {Large language models (LLMs) have demonstrated impressive capabilities in a wide range of downstream natural language processing tasks.  Nevertheless, their considerable sizes and memory demands hinder practical deployment, underscoring the importance of developing efficient compression strategies.  Singular value decomposition (SVD) decomposes a matrix into orthogonal components, enabling efficient low-rank approximation. This is particularly suitable for LLM compression, where weight matrices often exhibit significant redundancy. However, current SVD-based methods neglect the residual matrix from truncation, resulting in significant truncation loss.  Additionally, compressing all layers of the model results in severe error propagation.  To overcome these limitations, we propose ERC-SVD, a new post-training SVD-based LLM compression method from an error-controlled perspective.  Specifically, we leverage the residual matrix generated during the truncation process to reduce truncation loss.  Moreover, under a fixed overall compression ratio, we selectively compress the last few layers of the model, which mitigates error propagation and improves compressed model performance. Comprehensive evaluations on diverse LLM families and multiple benchmark datasets indicate that ERC-SVD consistently achieves superior performance over existing counterpart methods, demonstrating its practical effectiveness.}
}

Endnote

%0 Conference Paper
%T ERC-SVD: Error-Controlled SVD for Large Language Model Compression
%A Haolei Bai
%A Siyong Jian
%A Tuo Liang
%A Yu Yin
%A Huan Wang
%B Conference on Parsimony and Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Rebekka Burkholz
%E Shiwei Liu
%E Saiprasad Ravishankar
%E William Redman
%E Wei Huang
%E Weijie Su
%E Zhihui Zhu	
%F pmlr-v328-bai26a
%I PMLR
%P 698--719
%U https://proceedings.mlr.press/v328/bai26a.html
%V 328
%X Large language models (LLMs) have demonstrated impressive capabilities in a wide range of downstream natural language processing tasks.  Nevertheless, their considerable sizes and memory demands hinder practical deployment, underscoring the importance of developing efficient compression strategies.  Singular value decomposition (SVD) decomposes a matrix into orthogonal components, enabling efficient low-rank approximation. This is particularly suitable for LLM compression, where weight matrices often exhibit significant redundancy. However, current SVD-based methods neglect the residual matrix from truncation, resulting in significant truncation loss.  Additionally, compressing all layers of the model results in severe error propagation.  To overcome these limitations, we propose ERC-SVD, a new post-training SVD-based LLM compression method from an error-controlled perspective.  Specifically, we leverage the residual matrix generated during the truncation process to reduce truncation loss.  Moreover, under a fixed overall compression ratio, we selectively compress the last few layers of the model, which mitigates error propagation and improves compressed model performance. Comprehensive evaluations on diverse LLM families and multiple benchmark datasets indicate that ERC-SVD consistently achieves superior performance over existing counterpart methods, demonstrating its practical effectiveness.

APA

Bai, H., Jian, S., Liang, T., Yin, Y. & Wang, H.. (2026). ERC-SVD: Error-Controlled SVD for Large Language Model Compression. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 328:698-719 Available from https://proceedings.mlr.press/v328/bai26a.html.

ERC-SVD: Error-Controlled SVD for Large Language Model Compression

Abstract

Cite this Paper

Related Material