AI illusions in research, legal applications, and books are on the rise and harder to fix

The associate professor at Columbia University’s School of Nursing was used to using artificial intelligence tools to refine the grammar, formatting, and other details of scientific papers. But a few weeks after submitting his latest research, he received a reference question from the journal he was planning to publish it in. The AI tools Topaz was using were silently inserting fabricated sources into his work.

“I felt very embarrassed,” said Topaz, who leads a team at Columbia University developing AI applications for health care. luck.

“I’m an AI researcher. I know about hallucinations,” he said. “If this happens to me as an AI expert, what will happen to others?”

This near miss sent Topaz on an investigation to find out how often experts are being tricked by AI. As it turns out, there are many answers.

In a study published earlier this month, lancetTopaz and colleagues audited the approximately 2.5 million biomedical articles and 97 million citations indexed in PubMed Central, a central repository used by clinicians and researchers around the world. They found more than 4,000 fabricated references buried across nearly 3,000 papers. While not all references were generated by AI, Topaz said the steady increase in fake sourcing became “vertical” in 2024, shortly after AI tools for research began to be more widely used.

“It makes a lot of sense for AI to be so connected to them now,” he says.

Over the past three years, the proportion of fabricated references in the biomedical literature has increased more than 12 times. In 2023, 1 in 2,828 papers will contain at least 1 fake reference, and by last year that rate had increased to 1 in 458. Researchers found that in the first seven weeks of 2026, 277 papers each contained at least one non-existent reference.

“I think this is the tip of the iceberg,” Topaz said.

Hallucinations occur when an AI model prioritizes word patterns over accuracy. Although they are often harmless, the risks are different as hallucinations threaten to undermine the scientific process when AI errors begin to infiltrate academic literature.

Medicine is a field that builds on itself. Clinical trials cite previous studies. Systematic reviews then aggregate those studies, which are ultimately cited in medical guidelines. Doctors and nurses rely on these guidelines when deciding how to treat patients. The fabricated research planted at the beginning of the process doesn’t stop there.

“This is the chain of evidence, the way we care and treat people. You put hypothetical research at the bottom of the stack, and the whole structure inherits that,” Topaz said.

“We have already seen papers from paper mills included in systematic reviews that inform clinical guidelines,” he added. “When guideline articles cite articles with partially fictitious reference lists, the chain of evidence-based treatment decisions is undermined.”

AI mistakes happen to everyone

The susceptibility of AI to hallucinations has been known for four years, when ChatGPT first appeared. Students bravely started submitting questionable AI-generated papers in their own names. But with so many tools, agents, and extensions now ubiquitous in nearly every profession, even experts in their field are being overwhelmed by AI.

Take the example of Steven Rosenbaum. This week, this writer and filmmaker made headlines for all the wrong reasons. new york times I identified a number of inaccurate quotations in his new book. The Future of Truth: How AI Reshapes Reality.

The book featured blurbs from prominent journalists, including Nicholas Thompson. atlantic oceanThis is a foreword by Maria Ressa, a journalist from the Philippines and the Nobel Peace Prize winner. It arrived according to times“To great fanfare.”

Rosenbaum’s book contained more than half a dozen misattributed or completely fabricated quotations that were apparently generated by an AI tool that he revealed he used in his acknowledgments. In a statement to timesRosenbaum acknowledged the error and called the episode “a warning about the risks of AI-assisted research and validation.”

Given how widely AI is used for expert-level knowledge work, cases like this may be inevitable. some news organizations, luck We are currently piloting the use of AI tools in reporting. Research shows that more than half of legal professionals use AI tools to prepare briefings and memos. According to a recent report from the American Medical Association, more than 80% of physicians are now professionally using AI to summarize research and create clinical documentation, and that percentage will more than double from 2023 onwards. Even Nobel laureates, such as literature laureate Olga Tokarczuk, have admitted to using AI in their work.

In terms of research, a study conducted by an American medical journal last year found that 36% of papers included at least AI-generated text, but only 9% of researchers disclosed this when asked before submitting their papers. Another recent study found that more than half of researchers are likely to use AI tools when peer-reviewing the work of others.

But as it turns out, experts in the field are not immune to being fooled either. Topaz’s study of hallucinations in biomedical research joins a growing body of anecdotes and datasets documenting embarrassing mistakes, including legal analyst Damien Charlotin’s catalog of 1,459 legal decisions citing AI-generated inaccuracies. A year before he started the project, AI-induced hallucinations in litigation were occurring two to three times a month. Now I get about 5 cases a day.

When experts get it wrong

Fake research papers generated by AI are already a problem in academia, becoming increasingly difficult to parse and threatening to overwhelm peer review systems. But psychedelic references in actual research created by humans can be just as widespread and even more difficult to track down.

The majority of papers tracked by Topaz contained only one or two fabricated citations out of the dozens of references that academic research is typically required to publish, suggesting that most cases of AI hallucinations in research are unintentional.

But the publishing industry may be unprepared to deal with the proliferation of false mentions, Topaz said. Verification methods vary by journal, with some using software to check references and scan AI-generated content, but enforcement varies widely. There is also no easy mechanism to go back and screen the evidence chain to find the original fake study or reference. Few journals have previously been able to identify hallucinations, as Topaz’s analysis found that 98.4% of studies with fake references were not retracted by the publishers at the time of audit.

This is part of what those in the field refer to as science’s “reproducibility crisis,” which in the age of AI is exacerbated by the deluge of useless or unreliable AI-generated content that now permeates academic literature. But this is a similar story in other fields that rely on reproducible output. Newspaper articles foster conversation and form the basis for future research. Legal decisions end up being cited by lawyers and academics in other cases.

Topaz said that AI itself doesn’t have to be the bad guy, and that he’s happy to use it in his work. “The problem is that unverified AI output becomes part of the permanent record,” he said. “The solution is not to stop using tools, but to build validation into your workflow.”

“The longer the verification is delayed, the more difficult it will be to clean up,” he added.

AI illusions don’t care how familiar the user is with the subject matter. Mistakes are designed to look real and get better at hiding them. The more serious the field, such as medicine, law, or journalism, the greater the risk of an undetected error.

Source link