SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot

Elias Frantar; Dan Alistarh

SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot

Elias Frantar, Dan Alistarh

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:10323-10337, 2023.

Abstract

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-frantar23a,
  title = 	 {{S}parse{GPT}: Massive Language Models Can be Accurately Pruned in One-Shot},
  author =       {Frantar, Elias and Alistarh, Dan},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {10323--10337},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/frantar23a/frantar23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/frantar23a.html},
  abstract = 	 {We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.}
}

Endnote

%0 Conference Paper
%T SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot
%A Elias Frantar
%A Dan Alistarh
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-frantar23a
%I PMLR
%P 10323--10337
%U https://proceedings.mlr.press/v202/frantar23a.html
%V 202
%X We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.

APA


Frantar, E. & Alistarh, D.. (2023). SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:10323-10337 Available from https://proceedings.mlr.press/v202/frantar23a.html.

SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot

Abstract

Cite this Paper

Related Material