

Images by editor | chatgpt
# introduction
Hallucination – The language model (LM) and its users' worries are plausible, but effectively false statements created by the LMS. These hallucinations are problematic as they can erode user trust, propagate misinformation, and mislead downstream decisions even when the output is expressed confidently. These hallucinations are particularly cumbersome in scenarios where users cannot easily verify claims (technical responses, medical or legal summary, data analysis) as a confident delivery of false information masks underlying uncertainty.
Recent papers”Reasons for the language model hallucination“According to Kalai, Nachum, Vempala and Zhang, we've taken up the task of analyzing both the statistical roots of these errors and the sociotechnical incentives that keep them alive and alive. We might actually reduce them.
This paper provides some high-level, insightful revelation on the causes and persistence of LM hallucinations, and we look at these five.
# 1. The root cause of hallucinations
tl;dr: Hallucinations are primarily caused by training and assessment procedures that reward inferences against uncertainty recognition.
The central argument in this paper is that hallucinations, defined as plausible but false statements, are followed by procedures used in training and assessment to inadvertently reward confident inferences rather than perceived uncertainty. The LMS is optimized to act as a “good test taker.” In other words, we speculate that we cannot maximize scores under grading schemes that punish uncertain responses (such as “I don't know” or IDK). Under a general binary 0-1 scoring scheme, we speculate that we maximize the predicted score at times of uncertainty.


Proposed prompts to reduce “confident speculation” and encourage “uncertainty approval”
Images by the author | Gemini
# 2. The origin of hallucination
tl;dr: The statistical origin of hallucinations can be reduced from simple errors in binary classification.
This paper assumes hallucinations by claiming that they are not mystical but simply occurring as a false binary classification. Analysis connects generation errors (such as hallucinations) to a monitored learning problem called “IS-IT-Valid (IIV)” binary classification. If the system cannot statistically distinguish false statements from facts, the statistical purpose minimized during pretraining naturally leads to generation errors. This analysis shows mathematical relationships. The generation error rate is roughly proportional to twice the misclassification rate for IIV.


Misclassifying statements as “valid” leads to hallucinations
Images by the author | Gemini
# 3. Hallucinations are inevitable
tl;dr: Calibrated base models are mathematically forced to hallucinate, even with error-free training data.
This paper shows that even if the training corpus is complete and error-free, the process of minimizing statistical objectives during pretraining leads the language model to generate errors. This is linked to the concept of calibration. Because errors are the natural consequence of standard cross-entropy goals, a calibrated, well-trained basic model (aligns predicted probability in reality) must inevitably generate errors, especially when faced with inherently unreliable facts. Conversely, error-avoiding base models inevitably need to be mismatched (i.e., their uncertainty estimates must be incorrect).
# 4. Hallucinations are permanent
tl;dr: Persistence of hallucinations is driven by the “favor” of misaligned primary ratings.
Although post-training techniques often aim to reduce falsehood, hallucinations continue as the majority of existing influential benchmarks and leaderboards take advantage of binary grading systems (such as accuracy and pass rate) that punish abstain and uncertainty. This creates “social-technical” problems. If Model A correctly informs uncertainty, but always guesses when Model B is uncertain, Model B exceeds Model A under the 0-1 scoring scheme, enhancing the hallucinatory behavior of guessing. This advantage of this false assessment is a fundamental problem and cannot be solved by adding a small portion of the new hallucination-specific assessment.
# 5. The role of arbitrar sex
tl;dr: Statistical uncertainty resulting from any fact (low data frequency) is a key driver of pre-escape errors.
One of the major statistical factors that contribute to pre-escaping errors is the presence of arbitrary random facts defined as specific random facts that describe the target function. Examples include individual birthdays. The analysis shows that for any fact, the expected hallucination rate is lower by singleton rate, or the percentage of facts that appear once in the training data. For example, if 20% of birthday facts are displayed only once, the model is expected to hallucinate at least 20% of those facts. Other generation error factors include poor models (when the model family cannot express concepts well like in the letter count example) and gigo (duty, garbage out, and models replicate errors from training data).
# Key takeout
Several themes tie the paper together.
First of all, hallucinations are not a mysterious failure. Instead, they arise from the usual misclassification of validity. It's the same kind of binary error you make when the classifier can't tell the truth reliably from False.
Second, our dominant culture of assessment implicitly rewards confident inferences by punishing expressions of uncertainty. Therefore, a model who says “I don't know” looks good on the leaderboard, even if the leaderboard is wrong.
Third, durable progress does not occur from bolt-on patches. Changes in benchmark scoring should be changed to assess calibrated uncertainty and abstention, and training and deployment should be tailored to those incentives.
What to contemplate: What does your information consumption look like rewarding people and machines to know when they won't answer?
Matthew Mayo (@mattmayo13) Get a Master's degree in Computer Science and a Graduate Diploma in Data Mining. As editor-in-chief of Kdnuggets & Statology and contributor to Machine Learning Mastery, Matthew aims to provide access to complex concepts of data science. His professional interests include exploring natural language processing, language models, machine learning algorithms, and emerging AI. He is driven by his mission to democratize the knowledge of the data science community. Matthew has been coding since he was six years old.
