A Likelihood Based Approach for Watermark Detection

Xingchi Li, Guanxun Li, Xianyang Zhang
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1675-1683, 2025.

Abstract

Watermarking techniques embed statistical signals within content generated by large language models to help trace its source. Although existing methods perform well on long texts, their effectiveness significantly decreases for shorter texts. We introduce a statistical detection approach that improves the power of watermark detection, particularly in shorter texts. Our method leverages both the watermark key sequence and the next token probabilities (NTPs) to determine whether a text is generated by a large language model. We demonstrate the optimality of our approach and analyze its power properties. We also investigate an approach to estimating NTPs and extend our method to scenarios where texts face potential attacks such as substitutions, insertions, or deletions. We validate the effectiveness of our technique using texts generated by Meta-Llama-3-8B from Meta and Mistral-7B-v0.1 from Mistral AI, utilizing prompts extracted from Google’s C4 dataset. In scenarios without attacks and with short text lengths, our method demonstrates approximately 65% power improvement compared to the baseline method on average. We release all code publicly at \url{https://github.com/doccstat/llm-watermark-adaptive.}

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-li25d, title = {A Likelihood Based Approach for Watermark Detection}, author = {Li, Xingchi and Li, Guanxun and Zhang, Xianyang}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {1675--1683}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/li25d/li25d.pdf}, url = {https://proceedings.mlr.press/v258/li25d.html}, abstract = {Watermarking techniques embed statistical signals within content generated by large language models to help trace its source. Although existing methods perform well on long texts, their effectiveness significantly decreases for shorter texts. We introduce a statistical detection approach that improves the power of watermark detection, particularly in shorter texts. Our method leverages both the watermark key sequence and the next token probabilities (NTPs) to determine whether a text is generated by a large language model. We demonstrate the optimality of our approach and analyze its power properties. We also investigate an approach to estimating NTPs and extend our method to scenarios where texts face potential attacks such as substitutions, insertions, or deletions. We validate the effectiveness of our technique using texts generated by Meta-Llama-3-8B from Meta and Mistral-7B-v0.1 from Mistral AI, utilizing prompts extracted from Google’s C4 dataset. In scenarios without attacks and with short text lengths, our method demonstrates approximately 65% power improvement compared to the baseline method on average. We release all code publicly at \url{https://github.com/doccstat/llm-watermark-adaptive.}} }
Endnote
%0 Conference Paper %T A Likelihood Based Approach for Watermark Detection %A Xingchi Li %A Guanxun Li %A Xianyang Zhang %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-li25d %I PMLR %P 1675--1683 %U https://proceedings.mlr.press/v258/li25d.html %V 258 %X Watermarking techniques embed statistical signals within content generated by large language models to help trace its source. Although existing methods perform well on long texts, their effectiveness significantly decreases for shorter texts. We introduce a statistical detection approach that improves the power of watermark detection, particularly in shorter texts. Our method leverages both the watermark key sequence and the next token probabilities (NTPs) to determine whether a text is generated by a large language model. We demonstrate the optimality of our approach and analyze its power properties. We also investigate an approach to estimating NTPs and extend our method to scenarios where texts face potential attacks such as substitutions, insertions, or deletions. We validate the effectiveness of our technique using texts generated by Meta-Llama-3-8B from Meta and Mistral-7B-v0.1 from Mistral AI, utilizing prompts extracted from Google’s C4 dataset. In scenarios without attacks and with short text lengths, our method demonstrates approximately 65% power improvement compared to the baseline method on average. We release all code publicly at \url{https://github.com/doccstat/llm-watermark-adaptive.}
APA
Li, X., Li, G. & Zhang, X.. (2025). A Likelihood Based Approach for Watermark Detection. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:1675-1683 Available from https://proceedings.mlr.press/v258/li25d.html.

Related Material