Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models

Kejia Chen, Jiawen Zhang, Jiacong Hu, Yu Wang, Jian Lou, Zunlei Feng, Mingli Song
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:9728-9746, 2025.

Abstract

Quantized large language models (LLMs) have gained increasing attention and significance for enabling deployment in resource-constrained environments. However, emerging studies on a few calibration dataset-free quantization methods suggest that quantization may compromise the safety capabilities of LLMs, underscoring the urgent need for systematic safety evaluations and effective mitigation strategies. In this paper, we present comprehensive safety evaluations across various mainstream quantization techniques and diverse calibration datasets, utilizing widely accepted safety benchmarks. To address the identified safety vulnerabilities, we propose a quantization-aware safety patching framework, Q-resafe, to efficiently restore the safety capabilities of quantized LLMs while minimizing any adverse impact on utility. Extensive experiment results demonstrate that Q-resafe successfully re-aligns the safety of quantized LLMs with their pre-quantization counterparts, even under challenging evaluation scenarios. Project page: https://github.com/Thecommonirin/Qresafe.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-chen25ci, title = {Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models}, author = {Chen, Kejia and Zhang, Jiawen and Hu, Jiacong and Wang, Yu and Lou, Jian and Feng, Zunlei and Song, Mingli}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {9728--9746}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/chen25ci/chen25ci.pdf}, url = {https://proceedings.mlr.press/v267/chen25ci.html}, abstract = {Quantized large language models (LLMs) have gained increasing attention and significance for enabling deployment in resource-constrained environments. However, emerging studies on a few calibration dataset-free quantization methods suggest that quantization may compromise the safety capabilities of LLMs, underscoring the urgent need for systematic safety evaluations and effective mitigation strategies. In this paper, we present comprehensive safety evaluations across various mainstream quantization techniques and diverse calibration datasets, utilizing widely accepted safety benchmarks. To address the identified safety vulnerabilities, we propose a quantization-aware safety patching framework, Q-resafe, to efficiently restore the safety capabilities of quantized LLMs while minimizing any adverse impact on utility. Extensive experiment results demonstrate that Q-resafe successfully re-aligns the safety of quantized LLMs with their pre-quantization counterparts, even under challenging evaluation scenarios. Project page: https://github.com/Thecommonirin/Qresafe.} }
Endnote
%0 Conference Paper %T Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models %A Kejia Chen %A Jiawen Zhang %A Jiacong Hu %A Yu Wang %A Jian Lou %A Zunlei Feng %A Mingli Song %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-chen25ci %I PMLR %P 9728--9746 %U https://proceedings.mlr.press/v267/chen25ci.html %V 267 %X Quantized large language models (LLMs) have gained increasing attention and significance for enabling deployment in resource-constrained environments. However, emerging studies on a few calibration dataset-free quantization methods suggest that quantization may compromise the safety capabilities of LLMs, underscoring the urgent need for systematic safety evaluations and effective mitigation strategies. In this paper, we present comprehensive safety evaluations across various mainstream quantization techniques and diverse calibration datasets, utilizing widely accepted safety benchmarks. To address the identified safety vulnerabilities, we propose a quantization-aware safety patching framework, Q-resafe, to efficiently restore the safety capabilities of quantized LLMs while minimizing any adverse impact on utility. Extensive experiment results demonstrate that Q-resafe successfully re-aligns the safety of quantized LLMs with their pre-quantization counterparts, even under challenging evaluation scenarios. Project page: https://github.com/Thecommonirin/Qresafe.
APA
Chen, K., Zhang, J., Hu, J., Wang, Y., Lou, J., Feng, Z. & Song, M.. (2025). Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:9728-9746 Available from https://proceedings.mlr.press/v267/chen25ci.html.

Related Material