In a recent test conducted by the Authors Guild, several AI detectors demonstrated varying levels of accuracy in identifying human-written texts. Pangram and Grammarly correctly identified every human-written text as human, while Originality.ai also performed well. In contrast, Sidekicker and ZeroGPT failed to accurately detect human writing, with Sidekicker flagging every article as mostly AI-generated. The test used ten Guild articles published between 2020 and 2022, before generative AI became widespread. These results highlight the current limitations of AI detection tools in distinguishing between human and machine-generated content.
The Authors Guild emphasized that even the best-performing detectors should not be the sole basis for decisions, as their accuracy can vary and they are constantly evolving. Pangram CEO Max Spero noted that his detector operates as a black box, with no detailed explanation for why a text is flagged as AI-generated. He explained that language models often produce uniformity in arguments, which can resemble human writing, especially when writers have mastered clarity and precision. This creates a challenge for detection tools, as they may struggle to differentiate between a human writer who has perfected their craft and an AI that has learned to imitate human writing.
The test results mainly indicate that these tools are optimized to minimize false positives, avoiding cases where human text is wrongly flagged as AI-generated. However, this reliability does not necessarily mean they are equally effective at catching AI-generated content. The Authors Guild warned that false results can have serious consequences for authors, including losing contracts and reputations. The debate over AI detection continues, as the usefulness of these detectors remains in question due to the evolving nature of AI and its potential as a writing tool.
Source: thedecoder