Safety

Authors Guild Test Reveals AI Detectors Accurately Identify Human Writing

Authors Guild test shows Pangram and Grammarly correctly identified all human-written texts, while Sidekicker and ZeroGPT failed on every sample.

Professional woman standing confidently in a data center, surrounded by glowing servers.

Photo: Christina Morillo / Pexels

In a recent test conducted by the Authors Guild, several AI detectors demonstrated varying levels of accuracy in identifying human-written texts. Pangram and Grammarly correctly identified every human-written text as human, while Originality.ai also performed well. In contrast, Sidekicker and ZeroGPT failed to accurately detect human writing, with Sidekicker flagging every article as mostly AI-generated. The test used ten Guild articles published between 2020 and 2022, before generative AI became widespread. These results highlight the current limitations of AI detection tools in distinguishing between human and machine-generated content.

The Authors Guild emphasized that even the best-performing detectors should not be the sole basis for decisions, as their accuracy can vary and they are constantly evolving. Pangram CEO Max Spero noted that his detector operates as a black box, with no detailed explanation for why a text is flagged as AI-generated. He explained that language models often produce uniformity in arguments, which can resemble human writing, especially when writers have mastered clarity and precision. This creates a challenge for detection tools, as they may struggle to differentiate between a human writer who has perfected their craft and an AI that has learned to imitate human writing.

The test results mainly indicate that these tools are optimized to minimize false positives, avoiding cases where human text is wrongly flagged as AI-generated. However, this reliability does not necessarily mean they are equally effective at catching AI-generated content. The Authors Guild warned that false results can have serious consequences for authors, including losing contracts and reputations. The debate over AI detection continues, as the usefulness of these detectors remains in question due to the evolving nature of AI and its potential as a writing tool.

Source: thedecoder

Key points

Pangram and Grammarly correctly identified every human-written text as human in the Authors Guild test.
Sidekicker and ZeroGPT failed to accurately detect human writing, with Sidekicker flagging every article as mostly AI-generated.
The test used ten Guild articles published between 2020 and 2022, before generative AI went mainstream.
Pangram CEO Max Spero explained that his detector operates as a black box with no detailed explanation for flagging texts as AI-generated.
Language models often produce uniformity in arguments, which can resemble human writing, especially when writers have mastered clarity and precision.
The Authors Guild warned that false results from AI detectors can cost authors their contracts and reputations.
The test results mainly show that these tools are optimized to minimize false positives, avoiding cases where human text is wrongly flagged as AI-generated.

Source: The Decoder Read the original →

WRITTEN BY

Nadia Rahman

AI Safety, Alignment & Policy

Nadia follows AI safety, alignment, regulation, and the policy debates shaping the field.