LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Yixiao Li; Yifan Yu; Qingru Zhang; Chen Liang; Pengcheng He; Weizhu Chen; Tuo Zhao

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:20336-20350, 2023.

Abstract

Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To re- duce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse ap- proximation), a novel model compression tech- nique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. Our method combines the advantages of both low- rank approximations and pruning, while avoid- ing their limitations. Low-rank approximation compresses the coherent and expressive parts in neurons, while pruning removes the incoherent and non-expressive parts in neurons. Pruning enhances the diversity of low-rank approxima- tions, and low-rank approximation prevents prun- ing from losing too many expressive neurons. We evaluate our method on natural language under- standing, question answering, and natural lan- guage generation tasks. We show that it signif- icantly outperforms existing compression meth- ods. Our code is publicly available at https: //github.com/yxli2123/LoSparse

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-li23ap,
  title = 	 {{L}o{S}parse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation},
  author =       {Li, Yixiao and Yu, Yifan and Zhang, Qingru and Liang, Chen and He, Pengcheng and Chen, Weizhu and Zhao, Tuo},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {20336--20350},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/li23ap/li23ap.pdf},
  url = 	 {https://proceedings.mlr.press/v202/li23ap.html},
  abstract = 	 {Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To re- duce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse ap- proximation), a novel model compression tech- nique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. Our method combines the advantages of both low- rank approximations and pruning, while avoid- ing their limitations. Low-rank approximation compresses the coherent and expressive parts in neurons, while pruning removes the incoherent and non-expressive parts in neurons. Pruning enhances the diversity of low-rank approxima- tions, and low-rank approximation prevents prun- ing from losing too many expressive neurons. We evaluate our method on natural language under- standing, question answering, and natural lan- guage generation tasks. We show that it signif- icantly outperforms existing compression meth- ods. Our code is publicly available at https: //github.com/yxli2123/LoSparse}
}

Endnote

%0 Conference Paper
%T LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
%A Yixiao Li
%A Yifan Yu
%A Qingru Zhang
%A Chen Liang
%A Pengcheng He
%A Weizhu Chen
%A Tuo Zhao
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-li23ap
%I PMLR
%P 20336--20350
%U https://proceedings.mlr.press/v202/li23ap.html
%V 202
%X Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To re- duce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse ap- proximation), a novel model compression tech- nique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. Our method combines the advantages of both low- rank approximations and pruning, while avoid- ing their limitations. Low-rank approximation compresses the coherent and expressive parts in neurons, while pruning removes the incoherent and non-expressive parts in neurons. Pruning enhances the diversity of low-rank approxima- tions, and low-rank approximation prevents prun- ing from losing too many expressive neurons. We evaluate our method on natural language under- standing, question answering, and natural lan- guage generation tasks. We show that it signif- icantly outperforms existing compression meth- ods. Our code is publicly available at https: //github.com/yxli2123/LoSparse

APA


Li, Y., Yu, Y., Zhang, Q., Liang, C., He, P., Chen, W. & Zhao, T.. (2023). LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:20336-20350 Available from https://proceedings.mlr.press/v202/li23ap.html.

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Abstract

Cite this Paper

Related Material