SAFE: Finding Sparse and Flat Minima to Improve Pruning

Dongyeop Lee, Kwanhee Lee, Jinseok Chung, Namhoon Lee
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:33300-33321, 2025.

Abstract

Sparsifying neural networks often suffers from seemingly inevitable performance degradation, and it remains challenging to restore the original performance despite much recent progress. Motivated by recent studies in robust optimization, we aim to tackle this problem by finding subnetworks that are both sparse and flat at the same time. Specifically, we formulate pruning as a sparsity-constrained optimization problem where flatness is encouraged as an objective. We solve it explicitly via an augmented Lagrange dual approach and extend it further by proposing a generalized projection operation, resulting in novel pruning methods called SAFE and its extension, SAFE$^+$. Extensive evaluations on standard image classification and language modeling tasks reveal that SAFE consistently yields sparse networks with improved generalization performance, which compares competitively to well-established baselines. In addition, SAFE demonstrates resilience to noisy data, making it well-suited for real-world conditions.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-lee25s, title = {{SAFE}: Finding Sparse and Flat Minima to Improve Pruning}, author = {Lee, Dongyeop and Lee, Kwanhee and Chung, Jinseok and Lee, Namhoon}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {33300--33321}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/lee25s/lee25s.pdf}, url = {https://proceedings.mlr.press/v267/lee25s.html}, abstract = {Sparsifying neural networks often suffers from seemingly inevitable performance degradation, and it remains challenging to restore the original performance despite much recent progress. Motivated by recent studies in robust optimization, we aim to tackle this problem by finding subnetworks that are both sparse and flat at the same time. Specifically, we formulate pruning as a sparsity-constrained optimization problem where flatness is encouraged as an objective. We solve it explicitly via an augmented Lagrange dual approach and extend it further by proposing a generalized projection operation, resulting in novel pruning methods called SAFE and its extension, SAFE$^+$. Extensive evaluations on standard image classification and language modeling tasks reveal that SAFE consistently yields sparse networks with improved generalization performance, which compares competitively to well-established baselines. In addition, SAFE demonstrates resilience to noisy data, making it well-suited for real-world conditions.} }
Endnote
%0 Conference Paper %T SAFE: Finding Sparse and Flat Minima to Improve Pruning %A Dongyeop Lee %A Kwanhee Lee %A Jinseok Chung %A Namhoon Lee %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-lee25s %I PMLR %P 33300--33321 %U https://proceedings.mlr.press/v267/lee25s.html %V 267 %X Sparsifying neural networks often suffers from seemingly inevitable performance degradation, and it remains challenging to restore the original performance despite much recent progress. Motivated by recent studies in robust optimization, we aim to tackle this problem by finding subnetworks that are both sparse and flat at the same time. Specifically, we formulate pruning as a sparsity-constrained optimization problem where flatness is encouraged as an objective. We solve it explicitly via an augmented Lagrange dual approach and extend it further by proposing a generalized projection operation, resulting in novel pruning methods called SAFE and its extension, SAFE$^+$. Extensive evaluations on standard image classification and language modeling tasks reveal that SAFE consistently yields sparse networks with improved generalization performance, which compares competitively to well-established baselines. In addition, SAFE demonstrates resilience to noisy data, making it well-suited for real-world conditions.
APA
Lee, D., Lee, K., Chung, J. & Lee, N.. (2025). SAFE: Finding Sparse and Flat Minima to Improve Pruning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:33300-33321 Available from https://proceedings.mlr.press/v267/lee25s.html.

Related Material