Study finds AI chatbots still have problems with news accuracy

A month-long experiment has raised new concerns about the reliability of generative AI tools as news sources, after Google’s Gemini chatbot was found to have fabricated entire news outlets and published false reports. The findings were first reported by The Conversation newspaper, which conducted the study.

The experiment, led by a journalism professor specializing in computer science, tested seven generative AI systems over four weeks. Each day, the tool was asked to list and summarize the five most important news events in Quebec, rank them by importance, and provide direct links to the articles as sources. Among the systems tested were Google’s Gemini, OpenAI’s ChatGPT, Claude, Copilot, Grok, DeepSeek, and Aria.

The most notable failure involved Gemini inventing a fictitious news organization. For example fictif.ca – and incorrectly reported that there was a school bus driver strike in Quebec in September 2025. In reality, the disruption was caused by the withdrawal of Lion Electric Buses due to technical issues. This was not a special case. Across the 839 responses collected during the experiment, the AI system regularly cited fictitious sources, provided broken or incomplete URLs, and misrepresented actual reports.

This finding is important because more and more people are already using AI chatbots for news

According to the Reuters Institute Digital News Report, in 2024, 6% of Canadians relied on generative AI as a news source. When these tools hallucinate facts, distort reporting, or fabricate conclusions, they risk spreading misinformation, especially when answers are confidently presented without clear disclaimers.

For users, the risks are real and immediate. Only 37% of responses included a complete and legitimate source URL. In fewer than half of the cases, the summaries were completely accurate, but in many cases they were only partially accurate or subtly misleading. In some cases, AI tools added unsubstantiated “generated conclusions” and claimed articles that were never mentioned by human sources “rekindled the debate” or “highlighted tensions.” These additions may sound insightful, but they can create a narrative that simply doesn’t exist.

The error was not limited to manufacturing

Some tools distorted the real story, such as incorrectly reporting the treatment of asylum seekers or incorrectly identifying the winner of a major sporting event. Some have made basic factual errors in polling data or personal circumstances. Taken together, these issues show that generative AI still struggles to distinguish between news summaries and context inventions.

Looking ahead, the concerns raised by The Conversation are consistent with broader industry reviews. A recent report by 22 public service media organizations found that nearly half of AI-generated news responses contained significant problems, ranging from sourcing issues to significant inaccuracies. As AI tools become more integrated into search and everyday information habits, this finding highlights a clear warning. When it comes to news, generative AI should at best be treated as a starting point rather than a reliable source of record.

Source link