Vanishing Feature: Diagnosing Model Merging and Beyond

Xingyu Qu, Samuel Horváth
Conference on Parsimony and Learning, PMLR 280:1051-1086, 2025.

Abstract

Model merging offers an efficient way to combine pre-trained neural networks but often suffers from inconsistent performance, especially when merging models with different initializations. We identify the ”vanishing feature” phenomenon, where input-induced features diminish during propagation through the merged model, degrading performance. Through theoretical and empirical analysis, we reveal that this phenomenon underpins challenges like variance collapse and explains techniques like permutation-based merging, post-merging normalization, etc. We show that existing normalization strategies can be enhanced by precisely targeting the vanishing feature issue. Leveraging these insights, we propose the ”Preserve-First Merging” (PFM) strategy, which preserves early-layer features, enabling merged VGG16 models on CIFAR-10 to surpass the original models without post-training for the first time. Furthermore, we demonstrate that the vanishing feature phenomenon extends to other contexts, such as model pruning. Applying post-pruning normalization to mitigate the issue significantly improves one-shot pruning performance at high sparsity, offering a simple and effective post-pruning solution. The code is available at https://github.com/XingyuQu/VF.

Cite this Paper


BibTeX
@InProceedings{pmlr-v280-qu25a, title = {Vanishing Feature: Diagnosing Model Merging and Beyond}, author = {Qu, Xingyu and Horv\'{a}th, Samuel}, booktitle = {Conference on Parsimony and Learning}, pages = {1051--1086}, year = {2025}, editor = {Chen, Beidi and Liu, Shijia and Pilanci, Mert and Su, Weijie and Sulam, Jeremias and Wang, Yuxiang and Zhu, Zhihui}, volume = {280}, series = {Proceedings of Machine Learning Research}, month = {24--27 Mar}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v280/main/assets/qu25a/qu25a.pdf}, url = {https://proceedings.mlr.press/v280/qu25a.html}, abstract = {Model merging offers an efficient way to combine pre-trained neural networks but often suffers from inconsistent performance, especially when merging models with different initializations. We identify the ”vanishing feature” phenomenon, where input-induced features diminish during propagation through the merged model, degrading performance. Through theoretical and empirical analysis, we reveal that this phenomenon underpins challenges like variance collapse and explains techniques like permutation-based merging, post-merging normalization, etc. We show that existing normalization strategies can be enhanced by precisely targeting the vanishing feature issue. Leveraging these insights, we propose the ”Preserve-First Merging” (PFM) strategy, which preserves early-layer features, enabling merged VGG16 models on CIFAR-10 to surpass the original models without post-training for the first time. Furthermore, we demonstrate that the vanishing feature phenomenon extends to other contexts, such as model pruning. Applying post-pruning normalization to mitigate the issue significantly improves one-shot pruning performance at high sparsity, offering a simple and effective post-pruning solution. The code is available at https://github.com/XingyuQu/VF.} }
Endnote
%0 Conference Paper %T Vanishing Feature: Diagnosing Model Merging and Beyond %A Xingyu Qu %A Samuel Horváth %B Conference on Parsimony and Learning %C Proceedings of Machine Learning Research %D 2025 %E Beidi Chen %E Shijia Liu %E Mert Pilanci %E Weijie Su %E Jeremias Sulam %E Yuxiang Wang %E Zhihui Zhu %F pmlr-v280-qu25a %I PMLR %P 1051--1086 %U https://proceedings.mlr.press/v280/qu25a.html %V 280 %X Model merging offers an efficient way to combine pre-trained neural networks but often suffers from inconsistent performance, especially when merging models with different initializations. We identify the ”vanishing feature” phenomenon, where input-induced features diminish during propagation through the merged model, degrading performance. Through theoretical and empirical analysis, we reveal that this phenomenon underpins challenges like variance collapse and explains techniques like permutation-based merging, post-merging normalization, etc. We show that existing normalization strategies can be enhanced by precisely targeting the vanishing feature issue. Leveraging these insights, we propose the ”Preserve-First Merging” (PFM) strategy, which preserves early-layer features, enabling merged VGG16 models on CIFAR-10 to surpass the original models without post-training for the first time. Furthermore, we demonstrate that the vanishing feature phenomenon extends to other contexts, such as model pruning. Applying post-pruning normalization to mitigate the issue significantly improves one-shot pruning performance at high sparsity, offering a simple and effective post-pruning solution. The code is available at https://github.com/XingyuQu/VF.
APA
Qu, X. & Horváth, S.. (2025). Vanishing Feature: Diagnosing Model Merging and Beyond. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 280:1051-1086 Available from https://proceedings.mlr.press/v280/qu25a.html.

Related Material