How the Granta Prize controversy sparked debate

Machine Learning


It is not unusual for criticism to arise after a literary award is announced. However, the uproar after publication in a British literary magazine granta announced last month that the regional winner of its annual short story award is a different kind of literary criticism: allegations about the use of artificial intelligence (AI).

Within days of the magazine announcing this year’s winners, commonwealth short story awardsuspicions arose that some of the award-winning works had signs of AI-generated text. The incident not only highlighted the increasing use of AI in creative and other forms of writing, but also brought scrutiny to the role of tools that claim to detect AI-generated text. Let me explain.

controversy

Since 2012, granta publishes award winners in partnership with the Commonwealth Foundation in five regions: Africa, Asia, Canada, Europe, the Caribbean and the Pacific. The overall winner will be announced on June 30th.

Days after the magazine announced the winners, many social media users began calling Trinidadian author Jameel Nazir’s short story “The Serpent in the Grove” (a Caribbean winner), with one calling it “100% AI-generated” and “some kind of Turing test,” citing AI detection tool Pangram.

The Turing Test was proposed by British mathematician Alan Turing in the 1950s to test a machine’s ability to exhibit intelligent behavior that is indistinguishable from human raters. To date, this is considered the benchmark for AI.

Similar allegations were made against two other laureates, Indian author Sharon Alparail (Asia) and Malta’s John Edward Demicoli (Canada and Europe), who again used pangrams. However, the remaining two, by Lisa-Anne Julien (South Africa, Africa) and Holly-Ann Miller (New Zealand, Pacific), were rated as “completely written by humans.”

In a written response, Mr. Alparayal said: said before indian express She says that “no AI tools were used at any stage of the writing, editing, or development process” for her stories.

machine learning

Story continues below this ad

To understand how these tools that claim to detect AI-generated text work, you first need to know the science of machine learning (ML). Simply put, ML refers to the use of data and statistics to build AI systems. This is done by feeding large data sets into computers, allowing them to think and reason at human-like or even superhuman levels.

“You take a bunch of examples of AI-written content and human-written content and feed them to a big model to do the classification. The model learns signals through the data, like, ‘Oh, AI models tend to use em dashes,’ or they tend to use the words ‘imperative’ or ‘dig.’ It is a statistical pattern that the model can learn from,” said Science Bangalore. indian express.

When allegations about AI started flying around after the award was announced, many pointed to “utterances,” or signs that the text was generated by AI. (The term comes from the card game poker and refers to unconscious changes in a player’s body language that give hints about their next move.)

In addition to em-dashes and specific words, Prouty says, they also contain text organized into bullet points, often with headings that indicate what the bullet points are about. And while AI-generated text tends to conclude things neatly, Prouty said, “The human conclusion may introduce new content, but the model’s conclusion rarely does.”

Story continues below this ad

He also mentioned an example of “negative parallelism,” a rhetorical writing style characterized by a formulaic “Y, not X” structure. “For example, ‘These headphones aren’t just hearing devices, they’re also sound-canceling devices.’ Models are doing that quite normally now,” he said.

As for where this information comes from, Prouti said research is ongoing but there are no clear answers yet.

“One common hypothesis is that you pre-train a model and then post-train it to make it safe and useful and able to follow instructions. This is typically done by contract annotators or data vendors who create samples that answer different types of questions,” he said.

“Many of these datasets are private and built by large frontier laboratories, and they have these clues. The people who are writing those answers are writing them this way, so the model reproduces that behavior,” he added.

Story continues below this ad

Will ChatGPT/Claude be a good detector?

Prouti said using ML to distinguish whether something was written by a human or an AI is just one approach, which is also used in Pangram, the AI ​​detector at the heart of the problem. He added that people now want to move beyond the “binary paradigm” of AI versus humans. “They (along with the research community as a whole) understand the scope of collaboration: Is it light assistance with AI, moderate assistance with AI? [or] Is it significantly aided by AI? ”

In a statement after the charges were filed, he said: granta‘ publisher Sigrid Rausing said the magazine used AI chatbot Claude to assess whether Nazir’s article was “generated by AI or not.” “The response was long and concluded that it was almost certainly not produced without human intervention,” the statement said.

According to Pruthi, asking Claude, ChatGPT, or Gemini whether something was written by an AI is a “very bad idea.” “The model isn’t specifically trained for this, so you might have to make an educated guess, but it’s not very accurate…This becomes a task that’s very sensitive to accuracy because it’s a high-stakes gamble,” he said.

Olga Tokarczuk Recently, Nobel Prize-winning Polish author Olga Tokarczuk drew criticism for her comments about using AI in research while writing her last novel. Photo: Wikimedia Commons

Another key difference is that many detectors are calibrated to have “fewer false positives,” Prouti said. A false positive is when a detector flags something written by a human as being generated by an AI, as opposed to a false negative, where the AI-generated text can be disguised as human.

Story continues below this ad

“So we will be specifically setting our models and thresholds so that the chances of human-written text being erroneously flagged as AI are very low,” he added.

AI detectors are also different from tools that detect plagiarism. Prouti said these are “two very different tasks.” Because plagiarism primarily involves copying intellectual works without attribution, “plagiarism detectors tend to score highly on how well a particular idea or work matches existing work.”

“On the contrary, the AI ​​detector is just trying to estimate whether a piece of text could have been generated by AI based on the many examples it has seen,” he said.

How reliable are these tools?

Prouti said Pangram, which claims a false positive rate of 1 in 10,000 (0.01%), is highly reliable and supported by several independent studies.

Story continues below this ad

But he cautioned that ML models are “obviously not 100% accurate all the time.” Using the analogy of email spam classification, he said: “We’re still developing ML models to detect what is spam and what isn’t. There, too, the content, the words used, the way it’s worded, etc. all help determine whether it’s spam or not. There are still some instances where things go wrong, such as when an important email is classified as spam or vice versa.”

This is because these tools have limitations. According to Pruthi, the fewer words there are, the more likely the ML model is to be wrong. This is because there aren’t enough indicators to tell with confidence whether something was written by an AI or a human.

Another limitation is what Prouti called “low-entropy text.” This refers to texts that are difficult to classify because they are generally accurate and precise in nature.

“Suppose I ask you, ‘Please tell me all the states in India in alphabetical order.'” The state you generate is one clear, definitive answer, but what the model generates is also the same answer…and you don’t know whether it’s from the model or the human. ” he said.

Story continues below this ad

Similarly, code (written instructions that tell a computer to perform a specific task) can also be “hard to detect” in some cases. “There’s only a certain way to write it,” Prouti added.

Pruthi mentioned another limitation in the scenario of using language models to slightly refine text, which he and his colleagues presented in a recent paper at an ML conference. “Even if the basic idea or content was written by you, the model may flag it as completely AI-generated, rather than saying it has been slightly edited or mixed with text,” he said.

This could deter writers from even using AI to refine their writing for fear that their work might be incorrectly marked as being generated by AI. “The mob will trust the machines, and the machines will be able to control the narrative that should have been ‘written by humans,'” Alpeiril said. indian express.

Prouti said that while incidents like Alpairil were unfortunate, AI text detection “saves writers’ lives and careers in other ways.”

Story continues below this ad

“There’s a lot of AI development happening on the internet right now, and many of the Kindle books that are being published are written entirely in AI,” he says. “So if a good detector can filter out most of that and at least label this as AI-generated, an attentive reader may choose not to consume that content. In a sense, a good AI detector is helping to focus attention on legitimately human-written content.”

Impact on writers and publishers

Recently, Nobel Prize-winning Polish author Olga Tokarczuk drew criticism for her comments about using AI in research while writing her last novel. Although Tokarczuk later clarified that he did not use AI in the writing process itself, Pruti believed transparency was key. “If an author is using AI and is benefiting from it, they can appropriately disclose that,” he said.

Jane Freedman, an American publishing expert with more than 20 years’ experience in the industry, agreed. “Everyone needs to be on the same page about where these tools are impacting the process and track how they are being used to the best of everyone’s ability,” she said. indian express.

According to her, there was a lack of trust around the use of AI itself. “One of the problems here is that everyone is doing their own thing behind the curtain, and part of that has to do with the taboos around technology and everyone’s insecurities about technology and different attitudes towards it,” she said.

In terms of using AI responsibly, Friedman cited the following recent report: new york times In that article, we mentioned that a nonfiction book published in the United States contained quotes fabricated by AI. “This is a classic example of people trusting AI too much or not yet having the skills to use it in a way that avoids these kinds of mistakes,” she said.

But she felt that writers would get smarter over time. Therefore, everyone working in the publishing industry and related industries such as media and academia had a responsibility to stay on the same page, even though there were “good reasons to be anti-AI.”

“I think it seems somehow childish or naive to just say, ‘I don’t want to be involved in it, you can’t make me.’ At some point, you have to realize that this technology is here,” she said.





Source link