The AI Detection Arms Race Is On

Edward Tian didn’t think of himself as a writer. As a computer science major at Princeton, he’d taken a couple of journalism classes, where he learned the basics of reporting, and his sunny affect and tinkerer’s curiosity endeared him to his teachers and classmates. But he describes his writing style at the time as “pretty bad”—formulaic and clunky. One of his journalism professors said that Tian was good at “pattern recognition,” which was helpful when producing news copy. So Tian was surprised when, sophomore year, he managed to secure a spot in John McPhee’s exclusive non-fiction writing seminar.

Every week, 16 students gathered to hear the legendary New Yorker writer dissect his craft. McPhee assigned exercises that forced them to think rigorously about words: Describe a piece of modern art on campus, or prune the Gettysburg Address for length. Using a projector and slides, McPhee shared hand-drawn diagrams that illustrated different ways he structured his own essays: a straight line, a triangle, a spiral. Tian remembers McPhee saying he couldn’t tell his students how to write, but he could at least help them find their own unique voice.

This article appears in the October 2023 issue. Subscribe to WIRED.

Photograph: Jessica Chou

If McPhee stoked a romantic view of language in Tian, computer science offered a different perspective: language as statistics. During the pandemic, he’d taken a year off to work at the BBC and intern at Bellingcat, an open source journalism project, where he’d written code to detect Twitter bots. As a junior, he’d taken classes on machine learning and natural language processing. And in the fall of 2022, he began to work on his senior thesis about detecting the differences between AI-generated and human-written text.

When ChatGPT debuted in November, Tian found himself in an unusual position. As the world lost its mind over this new, radically improved chatbot, Tian was already familiar with the underlying GPT-3 technology. And as a journalist who’d worked on rooting out disinformation campaigns, he understood the implications of AI-generated content for the industry.

While home in Toronto for winter break, Tian started playing around with a new program: a ChatGPT detector. He posted up at his favorite café, slamming jasmine tea, and stayed up late coding in his bedroom. His idea was simple. The software would scan a piece of text for two factors: “perplexity,” the randomness of word choice; and “burstiness,” the complexity or variation of sentences. Human writing tends to rate higher than AI writing on both metrics, which allowed Tian to guess how a piece of text had been created. Tian called the tool GPTZero—the “zero” signaled truth, a return to basics—and he put it online the evening of January 2. He posted a link on Twitter with a brief introduction. The goal was to combat “increasing AI plagiarism,” he wrote. “Are high school teachers going to want students using ChatGPT to write their history essays? Likely not.” Then he went to bed.

Tian woke up the next morning to hundreds of retweets and replies. There was so much traffic to the host server that many users couldn’t access it. “It was totally crazy,” Tian says. “My phone was blowing up.” A friend congratulated him on winning the internet. Teens on TikTok called him a narc. “A lot of the initial hate was like, ‘This kid is a snitch, he doesn’t have a life, he never had a girlfriend,’” says Tian with a grin. “Classic stuff.” (Tian has a girlfriend.) Within days, he was fielding calls from journalists around the world, eventually appearing on everything from NPR to the South China Morning Post to Anderson Cooper 360. Within a week, his original tweet had reached more than 7 million views.

via Wired Top Stories

September 14, 2023 at 05:03AM

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.