GRAIL: Post-hoc Compensation by Linear Reconstruction for Compressed Networks

Wenwu Tang; Dong Wang; Lothar Thiele; Olga Saukh

GRAIL: Post-hoc Compensation by Linear Reconstruction for Compressed Networks

Wenwu Tang, Dong Wang, Lothar Thiele, Olga Saukh

Conference on Parsimony and Learning, PMLR 328:881-895, 2026.

Abstract

Structured deep model compression methods reduce memory and inference costs, but the majority of existing approaches still suffer from notable accuracy degradation under aggressive compression. We propose \emph{post-hoc blockwise compensation}, called GRAIL, a simple zero-finetuning step applied after pruning or folding that restores each block’s input–output behavior using a small calibration set. The method summarizes producer-side activations with a Gram matrix and solves a ridge least-squares problem to project the original hidden representation onto the reduced hidden space, yielding a linear map that is merged into the consumer weights while the producer is narrowed to the selected or folded outputs. The approach is selector-agnostic (magnitude, Wanda, Gram-based selection, or folding), data-aware (requiring only a few forward passes without gradients or labels), and recovers classic pruning/folding when the Gram matrix is near identity. Across ResNets, ViTs, and decoder-only LLMs, post-hoc compensation with GRAIL consistently improves accuracy or perplexity over data-free and data-aware pruning/folding baselines in practical compression regimes, with manageable overhead and no backpropagation. Our code is available at: \href{https://github.com/TWWinde/GRAIL_Compensation}{https://github.com/TWWinde/GRAIL}

Cite this Paper

BibTeX

@InProceedings{pmlr-v328-tang26a,
  title = 	 {GRAIL: Post-hoc Compensation by Linear Reconstruction for Compressed Networks},
  author =       {Tang, Wenwu and Wang, Dong and Thiele, Lothar and Saukh, Olga},
  booktitle = 	 {Conference on Parsimony and Learning},
  pages = 	 {881--895},
  year = 	 {2026},
  editor = 	 {Burkholz, Rebekka and Liu, Shiwei and Ravishankar, Saiprasad and Redman, William and Huang, Wei and Su, Weijie and Zhu, Zhihui},
  volume = 	 {328},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {23--26 Mar},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v328/main/assets/tang26a/tang26a.pdf},
  url = 	 {https://proceedings.mlr.press/v328/tang26a.html},
  abstract = 	 {Structured deep model compression methods reduce memory and inference costs, but the majority of existing approaches still suffer from notable accuracy degradation under aggressive compression. We propose \emph{post-hoc blockwise compensation}, called GRAIL, a simple zero-finetuning step applied after pruning or folding that restores each block’s input–output behavior using a small calibration set. The method summarizes producer-side activations with a Gram matrix and solves a ridge least-squares problem to project the original hidden representation onto the reduced hidden space, yielding a linear map that is merged into the consumer weights while the producer is narrowed to the selected or folded outputs. The approach is selector-agnostic (magnitude, Wanda, Gram-based selection, or folding), data-aware (requiring only a few forward passes without gradients or labels), and recovers classic pruning/folding when the Gram matrix is near identity. Across ResNets, ViTs, and decoder-only LLMs, post-hoc compensation with GRAIL consistently improves accuracy or perplexity over data-free and data-aware pruning/folding baselines in practical compression regimes, with manageable overhead and no backpropagation. Our code is available at: \href{https://github.com/TWWinde/GRAIL_Compensation}{https://github.com/TWWinde/GRAIL}}
}

Endnote

%0 Conference Paper
%T GRAIL: Post-hoc Compensation by Linear Reconstruction for Compressed Networks
%A Wenwu Tang
%A Dong Wang
%A Lothar Thiele
%A Olga Saukh
%B Conference on Parsimony and Learning
%C Proceedings of Machine Learning Research
%D 2026
%E Rebekka Burkholz
%E Shiwei Liu
%E Saiprasad Ravishankar
%E William Redman
%E Wei Huang
%E Weijie Su
%E Zhihui Zhu	
%F pmlr-v328-tang26a
%I PMLR
%P 881--895
%U https://proceedings.mlr.press/v328/tang26a.html
%V 328
%X Structured deep model compression methods reduce memory and inference costs, but the majority of existing approaches still suffer from notable accuracy degradation under aggressive compression. We propose \emph{post-hoc blockwise compensation}, called GRAIL, a simple zero-finetuning step applied after pruning or folding that restores each block’s input–output behavior using a small calibration set. The method summarizes producer-side activations with a Gram matrix and solves a ridge least-squares problem to project the original hidden representation onto the reduced hidden space, yielding a linear map that is merged into the consumer weights while the producer is narrowed to the selected or folded outputs. The approach is selector-agnostic (magnitude, Wanda, Gram-based selection, or folding), data-aware (requiring only a few forward passes without gradients or labels), and recovers classic pruning/folding when the Gram matrix is near identity. Across ResNets, ViTs, and decoder-only LLMs, post-hoc compensation with GRAIL consistently improves accuracy or perplexity over data-free and data-aware pruning/folding baselines in practical compression regimes, with manageable overhead and no backpropagation. Our code is available at: \href{https://github.com/TWWinde/GRAIL_Compensation}{https://github.com/TWWinde/GRAIL}

APA

Tang, W., Wang, D., Thiele, L. & Saukh, O.. (2026). GRAIL: Post-hoc Compensation by Linear Reconstruction for Compressed Networks. Conference on Parsimony and Learning, in Proceedings of Machine Learning Research 328:881-895 Available from https://proceedings.mlr.press/v328/tang26a.html.

GRAIL: Post-hoc Compensation by Linear Reconstruction for Compressed Networks

Abstract

Cite this Paper

Related Material