Textual Unlearning Gives a False Sense of Unlearning

Jiacheng Du, Zhibo Wang, Jie Zhang, Xiaoyi Pang, Jiahui Hu, Kui Ren
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:14579-14597, 2025.

Abstract

Language Models (LMs) are prone to ”memorizing” training data, including substantial sensitive user information. To mitigate privacy risks and safeguard the right to be forgotten, machine unlearning has emerged as a promising approach for enabling LMs to efficiently ”forget” specific texts. However, despite the good intentions, is textual unlearning really as effective and reliable as expected? To address the concern, we first propose Unlearning Likelihood Ratio Attack+ (U-LiRA+), a rigorous textual unlearning auditing method, and find that unlearned texts can still be detected with very high confidence after unlearning. Further, we conduct an in-depth investigation on the privacy risks of textual unlearning mechanisms in deployment and present the Textual Unlearning Leakage Attack (TULA), along with its variants in both black- and white-box scenarios. We show that textual unlearning mechanisms could instead reveal more about the unlearned texts, exposing them to significant membership inference and data reconstruction risks. Our findings highlight that existing textual unlearning actually gives a false sense of unlearning, underscoring the need for more robust and secure unlearning mechanisms.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-du25d, title = {Textual Unlearning Gives a False Sense of Unlearning}, author = {Du, Jiacheng and Wang, Zhibo and Zhang, Jie and Pang, Xiaoyi and Hu, Jiahui and Ren, Kui}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {14579--14597}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/du25d/du25d.pdf}, url = {https://proceedings.mlr.press/v267/du25d.html}, abstract = {Language Models (LMs) are prone to ”memorizing” training data, including substantial sensitive user information. To mitigate privacy risks and safeguard the right to be forgotten, machine unlearning has emerged as a promising approach for enabling LMs to efficiently ”forget” specific texts. However, despite the good intentions, is textual unlearning really as effective and reliable as expected? To address the concern, we first propose Unlearning Likelihood Ratio Attack+ (U-LiRA+), a rigorous textual unlearning auditing method, and find that unlearned texts can still be detected with very high confidence after unlearning. Further, we conduct an in-depth investigation on the privacy risks of textual unlearning mechanisms in deployment and present the Textual Unlearning Leakage Attack (TULA), along with its variants in both black- and white-box scenarios. We show that textual unlearning mechanisms could instead reveal more about the unlearned texts, exposing them to significant membership inference and data reconstruction risks. Our findings highlight that existing textual unlearning actually gives a false sense of unlearning, underscoring the need for more robust and secure unlearning mechanisms.} }
Endnote
%0 Conference Paper %T Textual Unlearning Gives a False Sense of Unlearning %A Jiacheng Du %A Zhibo Wang %A Jie Zhang %A Xiaoyi Pang %A Jiahui Hu %A Kui Ren %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-du25d %I PMLR %P 14579--14597 %U https://proceedings.mlr.press/v267/du25d.html %V 267 %X Language Models (LMs) are prone to ”memorizing” training data, including substantial sensitive user information. To mitigate privacy risks and safeguard the right to be forgotten, machine unlearning has emerged as a promising approach for enabling LMs to efficiently ”forget” specific texts. However, despite the good intentions, is textual unlearning really as effective and reliable as expected? To address the concern, we first propose Unlearning Likelihood Ratio Attack+ (U-LiRA+), a rigorous textual unlearning auditing method, and find that unlearned texts can still be detected with very high confidence after unlearning. Further, we conduct an in-depth investigation on the privacy risks of textual unlearning mechanisms in deployment and present the Textual Unlearning Leakage Attack (TULA), along with its variants in both black- and white-box scenarios. We show that textual unlearning mechanisms could instead reveal more about the unlearned texts, exposing them to significant membership inference and data reconstruction risks. Our findings highlight that existing textual unlearning actually gives a false sense of unlearning, underscoring the need for more robust and secure unlearning mechanisms.
APA
Du, J., Wang, Z., Zhang, J., Pang, X., Hu, J. & Ren, K.. (2025). Textual Unlearning Gives a False Sense of Unlearning. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:14579-14597 Available from https://proceedings.mlr.press/v267/du25d.html.

Related Material