Robust Multi-bit Text Watermark with LLM-based Paraphrasers

Xiaojun Xu, Jinghan Jia, Yuanshun Yao, Yang Liu, Hang Li
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:69279-69292, 2025.

Abstract

We propose an imperceptible multi-bit text watermark embedded by paraphrasing with LLMs. We fine-tune a pair of LLM paraphrasers that are designed to behave differently so that their paraphrasing difference reflected in the text semantics can be identified by a trained decoder. To embed our multi-bit watermark, we use two paraphrasers alternatively to encode the pre-defined binary code at the sentence level. Then we use a text classifier as the decoder to decode each bit of the watermark. Through extensive experiments, we show that our watermarks can achieve over 99.99% detection AUC with small (1.1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with LLM-based evaluation.

Cite this Paper


BibTeX
@InProceedings{pmlr-v267-xu25k, title = {Robust Multi-bit Text Watermark with {LLM}-based Paraphrasers}, author = {Xu, Xiaojun and Jia, Jinghan and Yao, Yuanshun and Liu, Yang and Li, Hang}, booktitle = {Proceedings of the 42nd International Conference on Machine Learning}, pages = {69279--69292}, year = {2025}, editor = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry}, volume = {267}, series = {Proceedings of Machine Learning Research}, month = {13--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v267/main/assets/xu25k/xu25k.pdf}, url = {https://proceedings.mlr.press/v267/xu25k.html}, abstract = {We propose an imperceptible multi-bit text watermark embedded by paraphrasing with LLMs. We fine-tune a pair of LLM paraphrasers that are designed to behave differently so that their paraphrasing difference reflected in the text semantics can be identified by a trained decoder. To embed our multi-bit watermark, we use two paraphrasers alternatively to encode the pre-defined binary code at the sentence level. Then we use a text classifier as the decoder to decode each bit of the watermark. Through extensive experiments, we show that our watermarks can achieve over 99.99% detection AUC with small (1.1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with LLM-based evaluation.} }
Endnote
%0 Conference Paper %T Robust Multi-bit Text Watermark with LLM-based Paraphrasers %A Xiaojun Xu %A Jinghan Jia %A Yuanshun Yao %A Yang Liu %A Hang Li %B Proceedings of the 42nd International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2025 %E Aarti Singh %E Maryam Fazel %E Daniel Hsu %E Simon Lacoste-Julien %E Felix Berkenkamp %E Tegan Maharaj %E Kiri Wagstaff %E Jerry Zhu %F pmlr-v267-xu25k %I PMLR %P 69279--69292 %U https://proceedings.mlr.press/v267/xu25k.html %V 267 %X We propose an imperceptible multi-bit text watermark embedded by paraphrasing with LLMs. We fine-tune a pair of LLM paraphrasers that are designed to behave differently so that their paraphrasing difference reflected in the text semantics can be identified by a trained decoder. To embed our multi-bit watermark, we use two paraphrasers alternatively to encode the pre-defined binary code at the sentence level. Then we use a text classifier as the decoder to decode each bit of the watermark. Through extensive experiments, we show that our watermarks can achieve over 99.99% detection AUC with small (1.1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with LLM-based evaluation.
APA
Xu, X., Jia, J., Yao, Y., Liu, Y. & Li, H.. (2025). Robust Multi-bit Text Watermark with LLM-based Paraphrasers. Proceedings of the 42nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 267:69279-69292 Available from https://proceedings.mlr.press/v267/xu25k.html.

Related Material