detectai.media

Can AI-generated text even be detected?

Sometimes, but only for cooperatively watermarked text. Captured media leaves an acquisition fingerprint; text leaves none, so post-hoc detection only guesses.

By The DetectAI team
8 min read
Contents

Sometimes, but almost never in the situation where it matters. A photograph or a voice recording carries an acquisition fingerprint, a physical trace of the sensor, lens, microphone and codec that made it, and a detector can read that trace. Text carries no such fingerprint. Post-hoc detection therefore cannot measure where a passage came from; it can only guess authorship from statistical style, and that guess collapses on arbitrary, edited, non-cooperative writing. The one genuine exception is a watermark the generator injected as it wrote, and that covers only cooperatively generated text the accuser does not hold.

Why captured media can be detected and text cannot

Image and audio forensics work because capture leaves a record. A camera sensor stamps a photo with pattern noise, colour-filter interpolation and JPEG quantisation signatures; a microphone, room and codec stamp a recording along the chain from source to encoding. Those traces sit in the artifact whether or not anyone cooperates, and detection is at root the act of reading them. Text has nothing equivalent. The same characters result whether a sentence was typed, dictated, generated, or generated and then heavily revised, and the same string can arrive through a document, a web form or a copied note. Once a passage is reduced to text there is no sensor, room or codec history left to inspect, so a post-hoc detector falls back on distributional properties of the words, chiefly perplexity and burstiness. That is an inference about style, not a measurement of provenance.

The real divide is captured versus synthesised, not text versus image

It is tempting to treat text as a special case. It is not. The dividing line is the mode of production, not the medium: media captured from the physical world carry a fingerprint, media synthesised from scratch carry none, and that cut runs through the image and audio domains too. A hand-drawn or digitally painted picture in an AI-like style has no sensor noise to find, exactly as AI-style text has no signature to find. Text is simply the purest case of synthesised media, which is why provenance by fingerprint fails most cleanly there. You cannot prove a human drew a picture to look like AI, and for the same reason you cannot prove a human wrote a passage to look like AI.

Three different problems hide inside “detecting AI text”

The phrase bundles three tasks that should never be conflated. The first is post-hoc detection: guessing authorship from the text alone, with no cooperation. DetectGPT (Mitchell et al., ICML 2023) is a clean example, and its authors are explicit that it “does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text.” It is available to anyone, and it is the weakest. The second is generator-side watermarking, a signal injected during sampling, available only if the model provider opts in. The third is provider-side retrieval: matching a suspect passage against the provider’s own log of what it generated. Krishna et al. (NeurIPS 2023) build such a defence and detect 80% to 97% of paraphrased generations at a 1% false-positive rate, but state plainly that it “must be maintained by a language model API provider.” Only the first is available to a school, court or publisher, and it is the one that works least well.

Granting the strongest case: the injected watermark

The watermark is the one place text does carry a deliberate, readable trace, and the honest argument has to grant it at full strength. The Kirchenbauer green-list watermark (Kirchenbauer et al., ICML 2023) biases generation so the output is detectable “from short spans of tokens (as few as 25 tokens)” at “negligible impact on text quality.” The reliability follow-up is sharper still: the watermark survives human and machine paraphrase through n-gram leakage, and after strong human paraphrasing it is still detectable “after observing 800 tokens on average, when setting a 1e-5 false positive rate” (Kirchenbauer et al., ICLR 2024). But a watermark exists only where the generator chose to insert it, so it covers cooperatively generated text and nothing else.

Without cooperation, the guess collapses

The open case is the one people actually fight over: a real person wrote something, an institution holds no model log, and nobody knows whether any generator watermarked anything. Only the post-hoc guess is available there, and it does not survive contact with ordinary use. Paraphrase is the cheapest attack and needs no model access: the DIPPER paraphraser drops DetectGPT from 70.3% to 4.6% detection at a fixed 1% false-positive rate (Krishna et al., NeurIPS 2023). The benchmarks built for real conditions agree. RAID (Dugan et al., ACL 2024), spanning over 6 million generations across 11 models, 8 domains and 11 adversarial attacks, finds that changing the generator, the decoding strategy or adding a repetition penalty is “enough to introduce up to 95+% error rate,” and that cross-model accuracy “rarely achieves beyond 60%.” A tool tuned on one model, reading long unedited text, can look impressive; the same tool on an unseen model after a light rewrite is close to a coin toss.

The theory is contested, and the case does not need it

There is a theoretical argument that detection becomes impossible as models improve: Sadasivan et al. (2023) bound the best-possible detector by the overlap between human and machine text, so that as the two distributions converge the ceiling falls toward a coin flip. It is worth knowing that this is genuinely contested. As surveyed by Ghosal et al. (2023), the counter is that more and longer text samples can buy detectability back, lifting the ceiling at any fixed gap. The case here rests on neither side of that debate. It rests on the empirical record, which shows post-hoc detection failing under paraphrase, on unseen models, and on real human writing, regardless of how the theorem resolves.

The gap is narrowing, not widening

One forward-looking point sharpens the picture, offered as a direction of travel rather than a settled finding. Humans absorb the style of their environment, and the environment is increasingly the model. Kobak et al. (Science Advances 2024) infer from word-frequency shifts that at least 13.5% of 2024 biomedical abstracts were processed with a language model. Yakura et al. (2024), analysing 740,249 hours of spoken English, report a “measurable and abrupt increase in the use of words preferentially generated by ChatGPT” after its release, in speech, where no tool can be pasted in. As human writing drifts toward model style, the gap a post-hoc detector depends on shrinks, which makes the problem harder over time, not easier.

What the strongest objections show

Three objections are worth answering head-on, because the ruling was reached through them. First, this is not a matter of dismissing the best detectors as hype. Grant Binoculars at over 90% detection of ChatGPT text at a 0.01% false-positive rate (Hans et al., ICML 2024), Pangram at a 0.19% false-positive rate (Emi and Spero, Pangram Labs 2024), and SynthID across roughly 20 million responses (Dathathri et al., Nature 2024) at face value, and they still fail the consequential-use test, because each is measured in distribution on long, unedited, known-model text, and paraphrase and unseen models remove exactly those conditions. The missing independent replication of those vendor numbers cuts against the steelman, not for it. Second, the convergence trend above is a direction, not a load-bearing claim; reject it entirely and the verdict is unchanged. Third, no theoretical impossibility is claimed at all: the theorem is conceded contested, and the empirical record carries the argument on its own.

The ruling

So can AI-generated text be detected? Where the generator watermarked it or the provider logged it, yes, within stated bounds, and that is provenance you can trust if you hold the key. Where you are judging a stranger’s arbitrary prose, no, not reliably enough to carry a consequence. A text detector is a scoped provenance tool, not a general truth machine. For the reliability verdict this feeds, see are AI text detectors reliable; for why the watermark, even granted, is the wrong thing to build an accusation on, see why AI text watermarking is a bad idea. The honest answer to the title is conditional, and the condition is the whole story: detection reads a fingerprint when one was deliberately put there, and guesses when it was not.

Sources

  • Mitchell, Lee, Khazatsky et al. (2023). DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. ICML 2023.
  • Krishna, Song, Karpinska et al. (2023). Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. NeurIPS 2023.
  • Kirchenbauer, Geiping, Wen et al. (2023). A Watermark for Large Language Models. ICML 2023.
  • Kirchenbauer, Geiping, Wen et al. (2023). On the Reliability of Watermarks for Large Language Models. ICLR 2024.
  • Sadasivan, Kumar, Balasubramanian et al. (2023). Can AI-Generated Text be Reliably Detected?
  • Ghosal, Chakraborty, Geiping et al. (2023). Towards Possibilities and Impossibilities of AI-Generated Text Detection: A Survey.
  • Dugan, Hwang, Trhlik et al. (2024). RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors. ACL 2024.
  • Hans, Schwarzschild, Cherepanova et al. (2024). Spotting LLMs with Binoculars: Zero-Shot Detection of Machine-Generated Text. ICML 2024.
  • Emi, Spero (2024). Technical Report on the Pangram AI-Generated Text Classifier. Pangram Labs.
  • Dathathri, See et al. (2024). Scalable watermarking for identifying large language model outputs. Nature 634:818-823.
  • Kobak, González-Márquez, Horvát (2024). Delving into LLM-assisted writing in biomedical publications through excess vocabulary. Science Advances.
  • Yakura, Lopez-Lopez, Brinkmann (2024). Empirical evidence of Large Language Model’s influence on human spoken communication.
#text#detection#reliability
Last updated
27 June 2026
Category
Reliability