by Dr Nawab John Dar
Research shows that overly accommodating AI increases confidence, reduces accountability, and makes users more likely to trust incorrect answers

Many people have moments when this happens. Recognize. Share your plans with a chatbot, explain your side of an argument, or ask for feedback on a decision you’ve more or less already made. The machine responds warmly and confidently, affirming your ideas. That feels like good advice. What most people have no reason to doubt is that this agreement has nothing to do with whether they are right or not.
Two peer-reviewed studies, one by researchers at Anthropic and one by a team at Stanford University published in Science in March 2026, provided solid evidence behind what many users were beginning to feel. The tendency for AI systems to agree, verify, and follow is not a minor design quirk. This is a result of how these systems are trained, and research shows that they already influence how people think about themselves and the people around them.
What is flattery?
The term the researchers use is “sycophant,” which comes from an old word for flattering people who say what they want to hear from those in power. In AI research, this represents a model that adjusts its response to match what the user believes to be true rather than what the evidence actually supports. This is not the same as being cooperative or kind. The problem is that the model no longer acts as an independent evaluator. Read and reflect the user’s apparent beliefs. Users walk away feeling vindicated. But machines have substituted precision for approval.
The Anthropic paper, written by Mrinak Sharma and 18 colleagues and accepted at ICLR 2024, tested five leading AI assistants across four realistic text generation scenarios. This pattern was consistent across all five. When users hinted that they liked or disliked a passage, an argument, or a math solution, the model adjusted its feedback to match that hint, even if the content being evaluated did not change. When the user objected to the correct answer, the model backed off and provided the incorrect answer. If the user asserted a false belief, the model agreed rather than being corrected. The paper concludes that pandering is a general property of how these models were trained, and not an isolated quirk of a particular system.
Why does the training process produce this result?
To understand the cause, it helps to know what happens after a large language model absorbs initial training data. There is a second stage called reinforcement learning from human feedback, often referred to as RLHF. In this process, a human rater reads pairs of AI answers and chooses which one they prefer. The model then learns to produce more of those that capture that preference. The problem that anthropology researchers have identified is that human raters tend to favor answers that are consistent with what they already believe, with measurable consistency. When researchers analyzed preference data, they found that answers that matched users’ stated views were more likely to be selected as better answers. In some cases, raters chose a confident, flattering response over a less sophisticated but accurate one. The training process was, in effect, teaching the model to flatter.
OpenAI confirmed that this was not a theoretical issue when it publicly rolled back updates to the GPT-4o model in April 2025. This is one of the most remarkable reversals in the company’s history. OpenAI said in its own statement that the update was “too focused on short-term feedback and did not fully consider how users’ interactions with ChatGPT evolve over time,” making the updated model “overly flattering or consensual.” Users reported that the model glorified economically risky ideas, affirmed historical falsehoods, and unabashedly justified morally questionable actions. OpenAI’s follow-up statement went further, pointing out that the sycophantic model is not just about flattery. It also “corrected doubts, stirred anger, encouraged impulsive behavior, and inadvertently reinforced negative emotions.” The company acknowledged that this practice could raise safety concerns regarding mental health and emotional overdependence. The sycophants didn’t just tell users they were doing a good job.
What effect does research show it has on people?
The Anthropic paper proved that this behavior is widespread. A Stanford University study led by doctoral candidate Myra Chen and professor Dan Jurafsky and published in the journal Science on March 26, 2026, tested how it actually affects people on the receiving end. The research team structured the study in three parts. They collected around 12,000 social prompts and ran them across 11 leading AI models, including ChatGPT, Claude, Gemini, DeepSeek, and Llama. In all 11 cases, the model affirmed the user’s behavior 49% more than a human respondent given the same scenario. This is also true for prompts that describe deception, misconduct, or conduct that the question submitter already knew was problematic.
One of the most specific tests used posts from a Reddit community commonly known as “Am I the Asshole,” where users describe interpersonal conflicts and the community votes on who is at fault. The researchers selected about 2,000 posts where human consensus clearly determined that the original poster was in the wrong. When these same posts were sent to an AI model, the machine favored the original poster 51% of the time in the absence of any human judges. The model was one that could be systematically agreed with the questioner, regardless of the facts. The second part of the study involved more than 2,400 participants who interacted with an AI advisor who was either a happy person or a balanced person. People who spoke to a pleasant chatbot were more convinced they were right, less willing to take responsibility or repair conflicts they described, and more likely to rate the AI as trustworthy and want to use it again. The most disturbing findings concerned perception. When participants were asked to judge how objective each version of the AI was, they rated the goofball and balanced versions as similarly unbiased. They couldn’t detect the difference between a system that provides honest feedback and one that’s designed to buy into it. As Chen said in comments to the Stanford report, “By default, AI advice doesn’t tell people they’re wrong.”
short term impact
It is important to know exactly what the evidence does and does not prove. These studies capture effects from a single interaction. These do not prove that repeated use of AI causes permanent changes in personality or reasoning. Researchers have directly confirmed this. However, short-term discovery is not easy. A single conversation can make people more certain of their position, less willing to admit fault, and more trusting of the system that created those feelings, which are meaningful outcomes in themselves. The concern is what happens if these conversations continue to be used regularly over months or years.
A parallel study by Rathje et al., also published in 2025 and cited in a subsequent analysis, found that brief interactions with a goofy AI inflated self-awareness. After speaking with an empathetic model, participants rated themselves as more intelligent, more empathetic, and above average compared to others. They also rated flattering responses as higher quality and showed more interest in replying. The researchers described this as a perverse incentive that lures users back into a system that distorts their judgment. Over time, these dynamics can erode the function that honest disagreement has in everyday life. A friend who challenges your reasoning, a colleague who says your plan is flawed, a partner who says you’re being unfair. These interactions help people correct mistakes, consider alternative perspectives, and stay connected to how their actions appear to others. A system trained to please will not be able to perform this function and may not work actively. Concerns extend beyond the individual. These models are consulted by hundreds of millions of people for personal advice, moral judgment, and interpersonal guidance. When that default mode of consultation is affirmation rather than honest evaluation, the cumulative impact on how people reason and treat each other is a legitimate question for researchers, companies, and users alike.
What should I see?
For those who use AI tools regularly, there are practical patterns worth recognizing. flattery is most evident in situations where emotional stakes are involved. It’s when you explain your position on a conflict, share a decision you’re attached to, or ask for feedback on work you’re already invested in. In such moments, a model trained on your need for recognition will read your orientation and reflect it. Common signs include a chatbot that justifies your position without looking into it, overrules the correct answer when you express skepticism, or changes its tone to reflect your own confidence or dissatisfaction. If the reaction feels more like reassurance than analysis, it’s worth noting the difference.
Quick fix: When using AI to evaluate your own ideas or decisions that affect others, explicitly ask the model to discuss opposing views. Ask what those who disagree with you would say. Ask for the strongest opposition to your plan. If you give the model a clear role of being critical rather than supportive, the quality of the answers often changes dramatically. Chen was blunt about this in his Stanford report, saying, “I don’t think AI should be used to replace humans for this kind of thing. That’s the best bet right now.” When it comes to personal conflicts, moral judgments, and consequential choices, human disagreements serve a function that harmonious machines are not built to reproduce.
structural problems

Both studies point out that sycophancy is not a bug that can be easily removed. It comes from the reward logic of how these systems learn. As long as human raters prefer positive responses over accurate ones, and as long as training is optimized for that preference, that behavior will persist. OpenAI’s rollback of GPT-4o was an admission that even a small change in training focus produced a model that users and the companies themselves found problematic. The company explained its plans to focus on long-term user satisfaction over short-term approval, and to move future models clearly away from sycophantic tendencies. It remains to be seen whether it will lead to lasting changes to the industry as a whole.
A difficult challenge that major AI labs have not fully solved is building systems that are truly useful and truly honest. These two qualities are not always in tension, but neither are they automatically compatible. The machine that always makes you feel confident and understood isn’t necessarily the one that helps you think carefully. Research published this year suggests that the gap between these two is larger and more significant than the current default behavior of most major AI systems reflects.
(The author is a neuroscientist and postdoctoral fellow at the Salk Institute in California. His research focuses on Alzheimer’s disease, specifically the role of iron, stress, and cell death. He also works to improve access to brain health in medically underserved areas such as Jammu and Kashmir through Teleplak Healthcare. Ideas are personal.)
