MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition

Wei Li, Lujun Li, Hao Gu, You-Liang Huang, Mark G. Lee, Shengjie Sun, Wei Xue, Yike Guo
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:35209-35230, 2025.

Abstract

Mixture of Experts (MoE) architecture improves Large Language Models (LLMs) with better scaling, but its higher parameter counts and memory demands create challenges for deployment. In this paper, we present MoE-SVD, a new decomposition-based compression framework tailored for MoE LLMs without any extra training. By harnessing the power of Singular Value Decomposition (SVD), MoE-SVD addresses the critical issues of decomposition collapse and matrix redundancy in MoE architectures. Specifically, we first decompose experts into compact low-rank matrices, resulting in accelerated inference and memory optimization. In particular, we propose selective decomposition strategy by measuring sensitivity metrics based on weight singular values and activation statistics to automatically identify decomposable expert layers. Then, we share a single V-matrix across all experts and employ a top-k selection for U-matrices. This low-rank matrix sharing and trimming scheme allows for significant parameter reduction while preserving diversity among experts. Comprehensive experiments on Mixtral, Phi-3.5, DeepSeek, and Qwen2 MoE LLMs show MoE-SVD outperforms other compression methods, achieving a 60% compression ratio and 1.5$\times$ faster inference with minimal performance loss.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-li25az, title = {{M}o{E}-{SVD}: Structured Mixture-of-Experts {LLM}s Compression via Singular Value Decomposition}, author = {Li, Wei and Li, Lujun and Gu, Hao and Huang, You-Liang and Lee, Mark G. and Sun, Shengjie and Xue, Wei and Guo, Yike}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {35209--35230}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/li25az/li25az.pdf}, url = {https://proceedings.mlr.press/v267/li25az.html}, abstract = {Mixture of Experts (MoE) architecture improves Large Language Models (LLMs) with better scaling, but its higher parameter counts and memory demands create challenges for deployment. In this paper, we present MoE-SVD, a new decomposition-based compression framework tailored for MoE LLMs without any extra training. By harnessing the power of Singular Value Decomposition (SVD), MoE-SVD addresses the critical issues of decomposition collapse and matrix redundancy in MoE architectures. Specifically, we first decompose experts into compact low-rank matrices, resulting in accelerated inference and memory optimization. In particular, we propose selective decomposition strategy by measuring sensitivity metrics based on weight singular values and activation statistics to automatically identify decomposable expert layers. Then, we share a single V-matrix across all experts and employ a top-k selection for U-matrices. This low-rank matrix sharing and trimming scheme allows for significant parameter reduction while preserving diversity among experts. Comprehensive experiments on Mixtral, Phi-3.5, DeepSeek, and Qwen2 MoE LLMs show MoE-SVD outperforms other compression methods, achieving a 60% compression ratio and 1.5$\times$ faster inference with minimal performance loss.} }
Endnote
%0 Conference Paper %T MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition %A Wei Li %A Lujun Li %A Hao Gu %A You-Liang Huang %A Mark G. Lee %A Shengjie Sun %A Wei Xue %A Yike Guo %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-li25az %I PMLR %P 35209--35230 %U https://proceedings.mlr.press/v267/li25az.html %V 267 %X Mixture of Experts (MoE) architecture improves Large Language Models (LLMs) with better scaling, but its higher parameter counts and memory demands create challenges for deployment. In this paper, we present MoE-SVD, a new decomposition-based compression framework tailored for MoE LLMs without any extra training. By harnessing the power of Singular Value Decomposition (SVD), MoE-SVD addresses the critical issues of decomposition collapse and matrix redundancy in MoE architectures. Specifically, we first decompose experts into compact low-rank matrices, resulting in accelerated inference and memory optimization. In particular, we propose selective decomposition strategy by measuring sensitivity metrics based on weight singular values and activation statistics to automatically identify decomposable expert layers. Then, we share a single V-matrix across all experts and employ a top-k selection for U-matrices. This low-rank matrix sharing and trimming scheme allows for significant parameter reduction while preserving diversity among experts. Comprehensive experiments on Mixtral, Phi-3.5, DeepSeek, and Qwen2 MoE LLMs show MoE-SVD outperforms other compression methods, achieving a 60% compression ratio and 1.5$\times$ faster inference with minimal performance loss.
APA
Li, W., Li, L., Gu, H., Huang, Y., Lee, M.G., Sun, S., Xue, W. & Guo, Y.. (2025). MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:35209-35230 Available from https://proceedings.mlr.press/v267/li25az.html.

Related Material