[edit]
Position: On the Possibilities of AI-Generated Text Detection
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:6093-6115, 2024.
Abstract
Our study addresses the challenge of distinguishing human-written text from Large Language Model (LLM) outputs. We provide evidence that this differentiation is consistently feasible, except when human and machine text distributions are indistinguishable across their entire support. Employing information theory, we show that while detecting machine-generated text becomes harder as it nears human quality, it remains possible with adequate text data. We introduce guidelines on the required text data quantity, either through sample size or sequence length, for reliable AI text detection, through derivations of sample complexity bounds. This research paves the way for advanced detection methods. Our comprehensive empirical tests, conducted across various datasets (Xsum, Squad, IMDb, and Kaggle FakeNews) and with several state-of-the-art text generators (GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, Llama-2-70B-Chat-HF), assess the viability of enhanced detection methods against detectors like RoBERTa-Large/Base-Detector and GPTZero, with increasing sample sizes and sequence lengths. Our findings align with OpenAI’s empirical data related to sequence length, marking the first theoretical substantiation for these observations.