More advanced artificial intelligence The more you get (ai), the more you “hastised” and will incorrectly provide inaccurate information.
the study Performed by Openai, the latest and most powerful inference models, O3 and O4-MINI, found hallucinations of 33% and 48%, respectively, when tested with Openai Personqa benchmarks. This is more than twice the old O1 model. O3 provides more accurate information than its predecessor, but appears to come at the expense of more inaccurate hallucinations.
This raises concerns about the accuracy and reliability of large-scale language models (LLMS), such as AI chatbots, he said Eleanor WatsonInstitute of Electrical and Electronics Engineers (IEEE) Members and AI Ethics Engineers at Singularity University.
“The system has the same urgency and coherence that uses manufactured information such as invented facts, quotes, events and more for accurate content, risking misleading users in subtle and consequential ways,” Watson told Live Science.
Related: Research shows that cutting-edge AI models from Openai and Deepseek will experience a “complete collapse” if problems become too difficult
The hallucination problem highlights the need to carefully evaluate and supervise the information AI systems generate when using LLMS and inference models, experts say.
Does AIS dream of an electric sheep?
The heart of inference models is that they can essentially handle complex tasks by breaking them down into individual components and coming up with solutions to tackle them. Rather than trying to kick out an answer based on statistical probability, inference models come up with strategies to solve problems, just like human thinking.
To develop creative and potentially novel solutions to problems, AI needs to prevent hallucinations.
“It's important to note that hallucinations are not AI bugs, but features.” Solob Kazerounianan AI researcher at Vectra AI told Live Science. “To paraphrase my colleagues, all LLM outputs are hallucinations. That some of those hallucinations are true.” If the AI produces only verbatim outputs that it saw during training, then all AI will diminish into a massive search problem. ”
“You can generate previously written computer code, find proteins and molecules whose properties have already been studied, and answer homework questions previously asked. However, you cannot ask LLM to write lyrics for a concept album that fuses the lyrical styling of Snoog dogd and Bob Dylan and focuses on the idiosyncraticity of AI.
In fact, LLMS and the AI systems they power must be hallucinated to create, rather than simply providing existing information. Conceptually, it is similar to the way humans dream and imagine a scenario when they remind us of a new idea.
I'm thinking too much outside the box
but, AI hallucination This presents a problem with delivering accurate and correct information, especially when users obtain information at face value without checking or monitoring.
“This is especially problematic in domains where decisions rely on de facto accuracy, such as medicine, legal, and finance,” Watson said. “While more advanced models may reduce the frequency of obvious de facto mistakes, this problem persists in a more subtle way. Over time, Confulation can erode AI systems' perceptions as a reliable means, creating material harm when unidentified content acts.”
And this problem appears to have gotten worse as AI progresses. “As model functionality improves, errors often become overtly clear, but often more difficult to detect,” Watson noted. “Made-made content is increasingly embedded in plausible narratives and consistent reasoning chains. This poses a specific risk. Users may not know that errors exist.
Kazerounian supported this perspective. “Despite the general belief that the problem with AI hallucination can improve and improve over time, it is possible that recent generations of advanced inference models have actually begun to hallucinate more than their simple counterparts.
The situation is even more complicated as it can be very difficult to see how LLM comes up with the answer. Here we can draw in a comprehensive way how the human brain works in parallel with the ways we don't know yet.
recently Essay, Dario Amodythe CEO of AI Company humanity highlighted the lack of understanding of how AIS came up with answers and information. “When a generative AI system does something, such as summarizing a financial document, at a certain or accurate level, I don't know why it chooses, why it chooses a particular word over others, or why it makes a mistake despite being usually accurate,” he wrote.
According to Kazerounian, the problems caused by AI hallucinating inaccurate information are already very realistic. “There is no universal, verifiable way to obtain LLM. You never correctly answer the questions being asked about a corpus of accessible data,” he said. “Examples of nonexistent hallucination references, customer-facing chatbots structuring company policies, etc., are all too common now.”
Grab your dreams
Both Kazerounian and Watson told Live Science that ultimately it might be difficult to eliminate AI hallucinations. However, there may be ways to mitigate the problem.
Watson suggested that “searched generation” based on the output of the model on curated external knowledge sources would help ensure that AI-generated information is fixed by verifiable data.
“Another approach involves introducing structures into model inference. By checking your own output, comparing different perspectives, and encouraging you to follow logical steps, you can reduce the risk of unconstrained inference and improve consistency.”
“Finally, systems can be designed to recognize their own uncertainty. Rather than defaulting to confident answers, models can be taught to flag them when they are unknown or to defer human judgment when appropriate,” added Watson. “These strategies do not completely eliminate the risk of composing, but provide practical advancements to making AI output more reliable.”
Given that it may be nearly impossible to eliminate AI hallucinations, Kazerounian concluded that, especially in advanced models, the information generated by LLMS should be treated with “the same skepticism we reserve for our human counterparts.”
