Position: On the Possibilities of AI-Generated Text Detection

Souradip Chakraborty, Amrit Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:6093-6115, 2024.

Abstract

Our study addresses the challenge of distinguishing human-written text from Large Language Model (LLM) outputs. We provide evidence that this differentiation is consistently feasible, except when human and machine text distributions are indistinguishable across their entire support. Employing information theory, we show that while detecting machine-generated text becomes harder as it nears human quality, it remains possible with adequate text data. We introduce guidelines on the required text data quantity, either through sample size or sequence length, for reliable AI text detection, through derivations of sample complexity bounds. This research paves the way for advanced detection methods. Our comprehensive empirical tests, conducted across various datasets (Xsum, Squad, IMDb, and Kaggle FakeNews) and with several state-of-the-art text generators (GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, Llama-2-70B-Chat-HF), assess the viability of enhanced detection methods against detectors like RoBERTa-Large/Base-Detector and GPTZero, with increasing sample sizes and sequence lengths. Our findings align with OpenAI’s empirical data related to sequence length, marking the first theoretical substantiation for these observations.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-chakraborty24a, title = {Position: On the Possibilities of {AI}-Generated Text Detection}, author = {Chakraborty, Souradip and Bedi, Amrit and Zhu, Sicheng and An, Bang and Manocha, Dinesh and Huang, Furong}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {6093--6115}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/chakraborty24a/chakraborty24a.pdf}, url = {https://proceedings.mlr.press/v235/chakraborty24a.html}, abstract = {Our study addresses the challenge of distinguishing human-written text from Large Language Model (LLM) outputs. We provide evidence that this differentiation is consistently feasible, except when human and machine text distributions are indistinguishable across their entire support. Employing information theory, we show that while detecting machine-generated text becomes harder as it nears human quality, it remains possible with adequate text data. We introduce guidelines on the required text data quantity, either through sample size or sequence length, for reliable AI text detection, through derivations of sample complexity bounds. This research paves the way for advanced detection methods. Our comprehensive empirical tests, conducted across various datasets (Xsum, Squad, IMDb, and Kaggle FakeNews) and with several state-of-the-art text generators (GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, Llama-2-70B-Chat-HF), assess the viability of enhanced detection methods against detectors like RoBERTa-Large/Base-Detector and GPTZero, with increasing sample sizes and sequence lengths. Our findings align with OpenAI’s empirical data related to sequence length, marking the first theoretical substantiation for these observations.} }
Endnote
%0 Conference Paper %T Position: On the Possibilities of AI-Generated Text Detection %A Souradip Chakraborty %A Amrit Bedi %A Sicheng Zhu %A Bang An %A Dinesh Manocha %A Furong Huang %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-chakraborty24a %I PMLR %P 6093--6115 %U https://proceedings.mlr.press/v235/chakraborty24a.html %V 235 %X Our study addresses the challenge of distinguishing human-written text from Large Language Model (LLM) outputs. We provide evidence that this differentiation is consistently feasible, except when human and machine text distributions are indistinguishable across their entire support. Employing information theory, we show that while detecting machine-generated text becomes harder as it nears human quality, it remains possible with adequate text data. We introduce guidelines on the required text data quantity, either through sample size or sequence length, for reliable AI text detection, through derivations of sample complexity bounds. This research paves the way for advanced detection methods. Our comprehensive empirical tests, conducted across various datasets (Xsum, Squad, IMDb, and Kaggle FakeNews) and with several state-of-the-art text generators (GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, Llama-2-70B-Chat-HF), assess the viability of enhanced detection methods against detectors like RoBERTa-Large/Base-Detector and GPTZero, with increasing sample sizes and sequence lengths. Our findings align with OpenAI’s empirical data related to sequence length, marking the first theoretical substantiation for these observations.
APA
Chakraborty, S., Bedi, A., Zhu, S., An, B., Manocha, D. & Huang, F.. (2024). Position: On the Possibilities of AI-Generated Text Detection. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:6093-6115 Available from https://proceedings.mlr.press/v235/chakraborty24a.html.

Related Material