EvoPress: Accurate Dynamic Model Compression via Evolutionary Search

Oliver Sieberling; Denis Kuznedelev; Eldar Kurtic; Dan Alistarh

EvoPress: Accurate Dynamic Model Compression via Evolutionary Search

Oliver Sieberling, Denis Kuznedelev, Eldar Kurtic, Dan Alistarh

Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:55556-55590, 2025.

Abstract

The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the "importance" of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By formulating dynamic compression as a general optimization problem, EvoPress identifies optimal compression profiles in a highly efficient manner, and generalizes across diverse models and compression techniques. Via EvoPress, we achieve state-of-the-art performance for dynamic compression of Llama, Mistral, and Phi models, setting new benchmarks for structural pruning (block/layer dropping), unstructured sparsity, and quantization with dynamic bitwidths.

Cite this Paper

BibTeX

@InProceedings{pmlr-v267-sieberling25a,
  title = 	 {{E}vo{P}ress: Accurate Dynamic Model Compression via Evolutionary Search},
  author =       {Sieberling, Oliver and Kuznedelev, Denis and Kurtic, Eldar and Alistarh, Dan},
  booktitle = 	 {Proceedings of the 42nd International Conference on Machine Learning},
  pages = 	 {55556--55590},
  year = 	 {2025},
  editor = 	 {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume = 	 {267},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--19 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v267/main/assets/sieberling25a/sieberling25a.pdf},
  url = 	 {https://proceedings.mlr.press/v267/sieberling25a.html},
  abstract = 	 {The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the "importance" of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By formulating dynamic compression as a general optimization problem, EvoPress identifies optimal compression profiles in a highly efficient manner, and generalizes across diverse models and compression techniques. Via EvoPress, we achieve state-of-the-art performance for dynamic compression of Llama, Mistral, and Phi models, setting new benchmarks for structural pruning (block/layer dropping), unstructured sparsity, and quantization with dynamic bitwidths.}
}

Endnote

%0 Conference Paper
%T EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
%A Oliver Sieberling
%A Denis Kuznedelev
%A Eldar Kurtic
%A Dan Alistarh
%B Proceedings of the 42nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2025
%E Aarti Singh
%E Maryam Fazel
%E Daniel Hsu
%E Simon Lacoste-Julien
%E Felix Berkenkamp
%E Tegan Maharaj
%E Kiri Wagstaff
%E Jerry Zhu	
%F pmlr-v267-sieberling25a
%I PMLR
%P 55556--55590
%U https://proceedings.mlr.press/v267/sieberling25a.html
%V 267
%X The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by dynamic, non-uniform compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the "importance" of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By formulating dynamic compression as a general optimization problem, EvoPress identifies optimal compression profiles in a highly efficient manner, and generalizes across diverse models and compression techniques. Via EvoPress, we achieve state-of-the-art performance for dynamic compression of Llama, Mistral, and Phi models, setting new benchmarks for structural pruning (block/layer dropping), unstructured sparsity, and quantization with dynamic bitwidths.

APA

Sieberling, O., Kuznedelev, D., Kurtic, E. & Alistarh, D.. (2025). EvoPress: Accurate Dynamic Model Compression via Evolutionary Search. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:55556-55590 Available from https://proceedings.mlr.press/v267/sieberling25a.html.

EvoPress: Accurate Dynamic Model Compression via Evolutionary Search

Abstract

Cite this Paper

Related Material