GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance

Jinuk Kim, Marwa El Halabi, Wonpyo Park, Clemens Js Schaefer, Deokjae Lee, Yeonhong Park, Jae W. Lee, Hyun Oh Song
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:30011-30037, 2025.

Abstract

Post-training quantization is a key technique for reducing the memory and inference latency of large language models by quantizing weights and activations without requiring retraining. However, existing methods either (1) fail to account for the varying importance of hidden features to the end loss or, when incorporating end loss, (2) neglect the critical interactions between model weights. To address these limitations, we propose GuidedQuant, a novel quantization approach that integrates gradient information from the end loss into the quantization objective while preserving cross-weight dependencies within output channels. GuidedQuant consistently boosts the performance of state-of-the-art quantization methods across weight-only scalar, weight-only vector, and weight-and-activation quantization. Additionally, we introduce a novel non-uniform scalar quantization algorithm, which is guaranteed to monotonically decrease the quantization objective value, and outperforms existing methods in this category. We release the code at https://github.com/snu-mllab/GuidedQuant.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-kim25d, title = {{G}uided{Q}uant: Large Language Model Quantization via Exploiting End Loss Guidance}, author = {Kim, Jinuk and El Halabi, Marwa and Park, Wonpyo and Schaefer, Clemens Js and Lee, Deokjae and Park, Yeonhong and Lee, Jae W. and Song, Hyun Oh}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {30011--30037}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/kim25d/kim25d.pdf}, url = {https://proceedings.mlr.press/v267/kim25d.html}, abstract = {Post-training quantization is a key technique for reducing the memory and inference latency of large language models by quantizing weights and activations without requiring retraining. However, existing methods either (1) fail to account for the varying importance of hidden features to the end loss or, when incorporating end loss, (2) neglect the critical interactions between model weights. To address these limitations, we propose GuidedQuant, a novel quantization approach that integrates gradient information from the end loss into the quantization objective while preserving cross-weight dependencies within output channels. GuidedQuant consistently boosts the performance of state-of-the-art quantization methods across weight-only scalar, weight-only vector, and weight-and-activation quantization. Additionally, we introduce a novel non-uniform scalar quantization algorithm, which is guaranteed to monotonically decrease the quantization objective value, and outperforms existing methods in this category. We release the code at https://github.com/snu-mllab/GuidedQuant.} }
Endnote
%0 Conference Paper %T GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance %A Jinuk Kim %A Marwa El Halabi %A Wonpyo Park %A Clemens Js Schaefer %A Deokjae Lee %A Yeonhong Park %A Jae W. Lee %A Hyun Oh Song %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-kim25d %I PMLR %P 30011--30037 %U https://proceedings.mlr.press/v267/kim25d.html %V 267 %X Post-training quantization is a key technique for reducing the memory and inference latency of large language models by quantizing weights and activations without requiring retraining. However, existing methods either (1) fail to account for the varying importance of hidden features to the end loss or, when incorporating end loss, (2) neglect the critical interactions between model weights. To address these limitations, we propose GuidedQuant, a novel quantization approach that integrates gradient information from the end loss into the quantization objective while preserving cross-weight dependencies within output channels. GuidedQuant consistently boosts the performance of state-of-the-art quantization methods across weight-only scalar, weight-only vector, and weight-and-activation quantization. Additionally, we introduce a novel non-uniform scalar quantization algorithm, which is guaranteed to monotonically decrease the quantization objective value, and outperforms existing methods in this category. We release the code at https://github.com/snu-mllab/GuidedQuant.
APA
Kim, J., El Halabi, M., Park, W., Schaefer, C.J., Lee, D., Park, Y., Lee, J.W. & Song, H.O.. (2025). GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:30011-30037 Available from https://proceedings.mlr.press/v267/kim25d.html.

Related Material