Position: Editing Large Language Models Poses Serious Safety Risks

Paul Youssef, Zhixue Zhao, Daniel Braun, Jörg Schlötterer, Christin Seifert
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:82426-82440, 2025.

Abstract

Large Language Models (LLMs) contain large amounts of facts about the world. These facts can become outdated over time, which has led to the development of knowledge editing methods (KEs) that can change specific facts in LLMs with limited side effects. This position paper argues that editing LLMs poses serious safety risks that have been largely overlooked. First, we note the fact that KEs are widely available, computationally inexpensive, highly performant, and stealthy makes them an attractive tool for malicious actors. Second, we discuss malicious use cases of KEs, showing how KEs can be easily adapted for a variety of malicious purposes. Third, we highlight vulnerabilities in the AI ecosystem that allow unrestricted uploading and downloading of updated models without verification. Fourth, we argue that a lack of social and institutional awareness exacerbates this risk, and discuss the implications for different stakeholders. We call on the community to (i) research tamper-resistant models and countermeasures against malicious model editing, and (ii) actively engage in securing the AI ecosystem.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-youssef25a, title = {Position: Editing Large Language Models Poses Serious Safety Risks}, author = {Youssef, Paul and Zhao, Zhixue and Braun, Daniel and Schl\"{o}tterer, J\"{o}rg and Seifert, Christin}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {82426--82440}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/youssef25a/youssef25a.pdf}, url = {https://proceedings.mlr.press/v267/youssef25a.html}, abstract = {Large Language Models (LLMs) contain large amounts of facts about the world. These facts can become outdated over time, which has led to the development of knowledge editing methods (KEs) that can change specific facts in LLMs with limited side effects. This position paper argues that editing LLMs poses serious safety risks that have been largely overlooked. First, we note the fact that KEs are widely available, computationally inexpensive, highly performant, and stealthy makes them an attractive tool for malicious actors. Second, we discuss malicious use cases of KEs, showing how KEs can be easily adapted for a variety of malicious purposes. Third, we highlight vulnerabilities in the AI ecosystem that allow unrestricted uploading and downloading of updated models without verification. Fourth, we argue that a lack of social and institutional awareness exacerbates this risk, and discuss the implications for different stakeholders. We call on the community to (i) research tamper-resistant models and countermeasures against malicious model editing, and (ii) actively engage in securing the AI ecosystem.} }
Endnote
%0 Conference Paper %T Position: Editing Large Language Models Poses Serious Safety Risks %A Paul Youssef %A Zhixue Zhao %A Daniel Braun %A Jörg Schlötterer %A Christin Seifert %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-youssef25a %I PMLR %P 82426--82440 %U https://proceedings.mlr.press/v267/youssef25a.html %V 267 %X Large Language Models (LLMs) contain large amounts of facts about the world. These facts can become outdated over time, which has led to the development of knowledge editing methods (KEs) that can change specific facts in LLMs with limited side effects. This position paper argues that editing LLMs poses serious safety risks that have been largely overlooked. First, we note the fact that KEs are widely available, computationally inexpensive, highly performant, and stealthy makes them an attractive tool for malicious actors. Second, we discuss malicious use cases of KEs, showing how KEs can be easily adapted for a variety of malicious purposes. Third, we highlight vulnerabilities in the AI ecosystem that allow unrestricted uploading and downloading of updated models without verification. Fourth, we argue that a lack of social and institutional awareness exacerbates this risk, and discuss the implications for different stakeholders. We call on the community to (i) research tamper-resistant models and countermeasures against malicious model editing, and (ii) actively engage in securing the AI ecosystem.
APA
Youssef, P., Zhao, Z., Braun, D., Schlötterer, J. & Seifert, C.. (2025). Position: Editing Large Language Models Poses Serious Safety Risks. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:82426-82440 Available from https://proceedings.mlr.press/v267/youssef25a.html.

Related Material