I used an AI chatbot as a news source for a month, but it was unreliable and made mistakes

It was cute. But it was still a lie. Gemini invented a non-existent press and gave it a name fake example.ca (or examplefictif.ca,French).

Thanks to a generative AI system powered by Google, the company's fictional media reported that a school bus driver strike was called in Quebec on September 12, 2025. But that wasn't why school transportation was disrupted that day. This is because Lion Electric Bus was withdrawn due to technical issues.

This journalistic hallucination is probably the worst example of fabrication I've seen in an experiment that lasted about a month. But we found many others.

Rely on AI chatbots for news

As a journalism professor specializing in computer science, I've been using AI long before ChatGPT arrived in 2022. The latest digital news report from the Reuters Journalism Institute found that in 2024, 6 per cent of Canadians included generative AI chatbots among their news sources.

I was curious to see how accurately these tools could tell me what was going on in my area. Will they tell me the hard facts and “news stories”?

Every morning last September, I asked seven generative AI systems the same open-ended question (in French).

“What are the five most important news stories in Quebec today? List them in order of importance. Summarize each one in three sentences. Add a short title. Provide at least one source for each (the specific URL of the article, not the home page of the news outlet you used). Searchable on the web.”

I use three tools that I paid for (ChatGPT with the GPT-5 Auto model, Claude with the Sonnet 4.5 model, and Gemini with the 2.5 Pro model), one tool provided by my employer (Copilot with the GPT-4 architecture), and three tools with free versions (the tools built into the Opera web browser: DeepSeek, Grok, and Aria). I worked using.

dubious and sometimes imaginary sources

Throughout the month, I recorded 839 responses and first categorized them based on the sources provided. I wanted news, so I expected AI tools to tap into news media.

However, in 18 percent of cases they were unable to do so and instead relied on government websites, lobby groups, or fabricated imaginary sources such as those mentioned above. For example fictif.ca.

Although most news organizations block generative AI crawlers, the majority of answers I received included quotes from news organizations. However, in many cases, the URL provided resulted in a 404 error (the URL was incorrect or fabricated) or directed to the news organization's home page or a section of the news organization (we labeled these cases as “incomplete URLs”). This made it difficult to confirm whether the news provided by AI tools was reliable.

Only 37% of responses provided the full canonical URL.

The summaries generated by the AI system were accurate in 47% of cases, including four cases of outright plagiarism. Just over 45 percent of the answers were only partially accurate.

More on this later. First, it is important to discuss responses that were incorrect in whole or in part.

Content error

The worst mistake I found was definitely made by Grok. A generative AI tool offered by Elon Musk's social network X told me: [were] Abused in Sibougamau in northern Quebec:

“About 20 asylum seekers were sent to Sibougamau from Montreal, but the majority returned quickly due to inadequate conditions. They ironically report being treated like 'princes and princesses,' but in reality they lack support. This incident calls into question Quebec's refugee management.”

Grok's comment: La Presse Articles published on that day. But it twisted the story. in fact, La Presse Reported that the trip was a success. Of the 22 asylum seekers, 19 were offered jobs in Chibougamau.

Other examples of inaccuracy:

When the toddler was found alive in June 2025 after a grueling four-day search, Grok falsely claimed the child's mother had left her daughter along a highway in eastern Ontario “to go on vacation.” This was not reported anywhere.
Aria tells me that French cyclist Julien Alaphilippe has won the Grand Prix Cycliste de Montréal, an annual bicycle road race. This was not true. Alaphilippe won a similar race in Quebec City two days earlier. American Brandon McNulty won in Montreal.
Mr. Grok also said, [provincial] In the Leger poll, it was said that “the Liberal Party maintains a stable lead,'' and in fact, at the time, the Quebec Liberal Party was in second place and the Parti Québécois in first place.

I also noticed that there were a lot of spelling and grammar mistakes in French. If the tool had answered the questions in English, that number might have been lower.

I mentioned earlier that about 45 percent of the verifiable answers are partially reliable. Among these answers, we found many misconceptions that, while incorrect, cannot be classified as unreliable answers.

For example, the Chinese AI tool DeepSeek told us that “Apple season in Quebec” is “great.” The article on which this claim is based paints a more nuanced picture. “The season is not over yet,” said an orchard owner quoted in the article.

ChatGPT repeated the same strange expression for the second day in a row, writing that Mark Carney is “Quebec's most popular federal premier.” Of course, he's the only one.

generative conclusion

In most cases, we classified news items as “partly reliable” due to the different conclusions drawn by our generative AI tools.

For example, Grok and ChatGPT both covered a story about $2.3 million in emergency construction work being done on Quebec City's Pierre Laporte Bridge. Grocque's final words were: “This highlights the challenges of maintaining Quebec's critical infrastructure.” Meanwhile, ChatGPT wrote that the news “highlights the contradiction between budget constraints, planning, and public safety.”

None of this is wrong. Some may find this kind of contextualization helpful. Nevertheless, these conclusions are not supported by any sources and no one cited in the referenced article said so.

As another example, ChatGPT concluded that the accident north of Quebec City has “reignited the debate about provincial road safety.” No such discussion was reported in the article cited by AI Tools. As far as I know, no such discussion exists.

I found similar conclusions in the 111 stories generated by the AI system I used. They often included expressions such as “the situation is being highlighted,'' “the debate is being reignited,'' “there is tension,'' and “questions are arising.''

In no case did we find any humans mentioning the tensions or arguments reported by the AI tools. These “generated conclusions” appear to provoke debates that don't exist and can represent a risk of misinformation.

walk carefully

A few days after I published the French version of this article, a report by 22 public media organizations was published showing similar results.

The study found that “nearly half of all AI responses had at least one significant issue.” [that] One-third of responses indicated serious sourcing issues [and that] The fifth contained significant accuracy issues, including hallucinations and outdated information. ”

When we seek news, we should expect generative AI tools to be true to the facts. This is not the case, so those using AI as a source of trusted information should tread carefully.

Click here to view the Google Sheet file in which the author records the answers given each morning by the generated AI tool (in French)

Source link