Do AI Text Detectors Actually Work?

Recently, there has been a significant rise in the number and subsequent popularity of generative AIs that can create text that is almost indistinguishable from human-made text. As a result, countless websites have popped up that claim to be able to determine if an AI wrote a given piece of text.

AI text detectors work by using AI themselves. Generally, the AI breaks down the text into individual sentences. For each sentence, it looks at every word before the last word and, using the massive amounts of data it has been trained on, predicts what an AI would most likely say the last word is. From there, it can compare the predicted word to the real last word of the sentence. The more times these two words are the same, the higher probability the entire piece of text wasn’t created by a human. In addition, some systems take into account how much the text varies in aspects like word choice and sentence length.

There are tons of AI text detection tools online such as GLTR, Sapling, Copyleaks, Crossplag, Content at Scale, Kazan SEO, and Originality. In order to test the accuracy of AI text detectors, I asked ChatGPT various questions before finding a human-made answer to the same questions online. I also asked ChatGPT to create blog posts based on the titles of my own blog posts and compared the results of AI text detectors to ChatGPT’s response and my own blog post. Below are my findings for a few of the most popular and highly acclaimed free AI text detectors I found. Note that I used the same text to test each of these tools.

Writer

This AI text detector accurately labeled every piece of text written by humans. However, it also classified almost every AI-generated piece of text as written by humans, correctly identifying only one response from ChatGPT as artificial. Overall, I found this to be an extremely unreliable tool.

ZeroGPT

While ZeroGPT performed better than Writer when it came to correctly identifying AI-generated answers, it still performed quite poorly in that aspect, classifying only about half of ChatGPT’s responses as AI-generated. As for classifying human-made text, ZeroGPT was correct a vast majority of the time, with some mistakes. Generally, ZeroGPT isn’t a very reliable tool for AI text detection.

OpenAI

I expected OpenAI’s AI text detection system to perform much better than Writer’s system and ZeroGPT, considering that OpenAI created ChatGPT. However, the results were quite disappointing. It performed very similarly to Writer’s AI text detector, finding every piece of text to be likely written by a human. OpenAI does state that the tool is still a work in progress, and it’s likely that it will perform better when it is fully complete. For the time being, I would veer away from OpenAI’s AI text detection as well.

Winston

It is important to note that Winston’s AI text detector is only free for the first 2000 words, after which a subscription-based plan is required to continue to use the tool. Regardless, its performance was fairly strong. It was able to flag every ChatGPT response as AI-generated with extreme confidence. However, it was less consistent for the human text, with results that fluctuate greatly. Many of the samples of human text were classified as partly made by AI, others were said to be completely AI-generated, and some were accurately judged as free of AI use.

Content at Scale

This AI detector performed similarly to many of the other tools. Almost every sample of text, AI-generated or not, came back as likely written by a human. The closest it got to correctly classifying an AI-generated piece of text was by saying it was likely written by both a human and an AI.

Overall, it is very difficult to perform accurate AI text detection, and it’s important to take the results of such tools with a grain of salt, as they are generally quite inaccurate.