Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

Riade Benbaki; Wenyu Chen; Xiang Meng; Hussein Hazimeh; Natalia Ponomareva; Zhe Zhao; Rahul Mazumder

Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

Riade Benbaki, Wenyu Chen, Xiang Meng, Hussein Hazimeh, Natalia Ponomareva, Zhe Zhao, Rahul Mazumder

Proceedings of the 40th International Conference on Machine Learning, PMLR 202:2031-2049, 2023.

Abstract

The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful, these techniques often face serious tradeoffs between computational requirements and compression quality. In this work, we propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint. Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and results in significant improvements in speed, memory, and performance over existing optimization-based approaches for network pruning. CHITA’s main workhorse performs combinatorial optimization updates on a memory-friendly representation of local quadratic approximation(s) of the loss function. On a standard benchmark of pretrained models and datasets, CHITA leads to superior sparsity-accuracy tradeoffs than competing methods. For example, for MLPNet with only 2% of the weights retained, our approach improves the accuracy by 63% relative to the state of the art. Furthermore, when used in conjunction with fine-tuning SGD steps, our method achieves significant accuracy gains over state-of-the-art approaches. Our code is publicly available at: https://github.com/mazumder-lab/CHITA .

Cite this Paper

BibTeX


@InProceedings{pmlr-v202-benbaki23a,
  title = 	 {Fast as {CHITA}: Neural Network Pruning with Combinatorial Optimization},
  author =       {Benbaki, Riade and Chen, Wenyu and Meng, Xiang and Hazimeh, Hussein and Ponomareva, Natalia and Zhao, Zhe and Mazumder, Rahul},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {2031--2049},
  year = 	 {2023},
  editor = 	 {Krause, Andreas and Brunskill, Emma and Cho, Kyunghyun and Engelhardt, Barbara and Sabato, Sivan and Scarlett, Jonathan},
  volume = 	 {202},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--29 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v202/benbaki23a/benbaki23a.pdf},
  url = 	 {https://proceedings.mlr.press/v202/benbaki23a.html},
  abstract = 	 {The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful, these techniques often face serious tradeoffs between computational requirements and compression quality. In this work, we propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint. Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and results in significant improvements in speed, memory, and performance over existing optimization-based approaches for network pruning. CHITA’s main workhorse performs combinatorial optimization updates on a memory-friendly representation of local quadratic approximation(s) of the loss function. On a standard benchmark of pretrained models and datasets, CHITA leads to superior sparsity-accuracy tradeoffs than competing methods. For example, for MLPNet with only 2% of the weights retained, our approach improves the accuracy by 63% relative to the state of the art. Furthermore, when used in conjunction with fine-tuning SGD steps, our method achieves significant accuracy gains over state-of-the-art approaches. Our code is publicly available at: https://github.com/mazumder-lab/CHITA .}
}

Endnote

%0 Conference Paper
%T Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
%A Riade Benbaki
%A Wenyu Chen
%A Xiang Meng
%A Hussein Hazimeh
%A Natalia Ponomareva
%A Zhe Zhao
%A Rahul Mazumder
%B Proceedings of the 40th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2023
%E Andreas Krause
%E Emma Brunskill
%E Kyunghyun Cho
%E Barbara Engelhardt
%E Sivan Sabato
%E Jonathan Scarlett	
%F pmlr-v202-benbaki23a
%I PMLR
%P 2031--2049
%U https://proceedings.mlr.press/v202/benbaki23a.html
%V 202
%X The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful, these techniques often face serious tradeoffs between computational requirements and compression quality. In this work, we propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint. Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and results in significant improvements in speed, memory, and performance over existing optimization-based approaches for network pruning. CHITA’s main workhorse performs combinatorial optimization updates on a memory-friendly representation of local quadratic approximation(s) of the loss function. On a standard benchmark of pretrained models and datasets, CHITA leads to superior sparsity-accuracy tradeoffs than competing methods. For example, for MLPNet with only 2% of the weights retained, our approach improves the accuracy by 63% relative to the state of the art. Furthermore, when used in conjunction with fine-tuning SGD steps, our method achieves significant accuracy gains over state-of-the-art approaches. Our code is publicly available at: https://github.com/mazumder-lab/CHITA .

APA


Benbaki, R., Chen, W., Meng, X., Hazimeh, H., Ponomareva, N., Zhao, Z. & Mazumder, R.. (2023). Fast as CHITA: Neural Network Pruning with Combinatorial Optimization. Proceedings of the 40th International Conference on Machine Learning, in Proceedings of Machine Learning Research 202:2031-2049 Available from https://proceedings.mlr.press/v202/benbaki23a.html.

Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

Abstract

Cite this Paper

Related Material