The chatbot built by Elon Musk’s company xAI, which owns Company X, multiplied in popularity after users pushed back against the false claims.
“My previous answer is accurate,” Grok wrote.
A few days earlier, Grok responded to a user’s inquiry about a video that showed hospital staff restraining and beating a patient in an elevator.
Someone asked Grok to confirm the location of the video, which they claimed showed an incident at Toronto General Hospital in May 2020 that resulted in the death of 43-year-old Danielle Stephanie Wariner.
“If it’s Canada, why does the uniform have Russian written on it?” asked one user.
Grok said the uniform was “the standard green attire for security at Toronto General Hospital” and that the video depicts “an entirely Canadian event.”
fact
A reverse image search for still images from the video yields multiple news articles from Russian media from August 2021.
Translated into English, the details indicate that the video, first spread on Telegram channel Mash, took place in the Russian city of Yaroslavl.
The Yaroslavl Regional Psychiatric Hospital has fired two staff members who were seen on leaked surveillance footage attacking a woman after leading her into an elevator in a residential building, according to reports.
The 2020 incident at Toronto General Hospital that Grok referred to was partially captured on video. Part of the interaction between Patient Wariner and security staff is shown. The staff were charged with manslaughter and criminal negligence after Wariner died following the interaction, but the charges were later dropped.
Mr Carney is indeed prime minister, and has been since winning the Liberal leadership election in March, followed by the Liberal general election on April 28th.
In both cases, Grok eventually fixed the mistake after some prompting from users. But why did Grok keep repeating falsehoods and then doubling down when he was corrected?
“They have no idea of the truth.”
Grok and other chatbots like ChatGPT and Google’s Gemini are large-scale language models, or LLMs. It can recognize and generate text by training it with text from the internet.
Large-scale language models are “primarily just trained to predict the next word in a sentence, much like autocomplete on a cell phone,” said Vered Shwartz, assistant professor of computer science at the University of British Columbia and chair of CIFAR AI at the Vector Institute.
“They are exposed to a lot of text online, so they learn to produce text that is fluent and human-like. They also learn a lot of vocabulary knowledge and things that people are discussing online, and are usually able to give factually correct answers,” she said.
When they provide counterfactual information, it’s known as a “hallucination” and is inevitable because of the way language models are trained, the researchers say.
“They have no concept of truth…they just generate the next word that is statistically most likely,” Schwartz said.
“The result is texts with human-like fluency, often written in a very authoritative manner. However, they do not always reflect the information learned from the web. In some cases, they may inappropriately generalize or mix facts that are not true,” she said.
The quality of large-scale language models depends in part on the quality of the data on which they are trained. Although most models are proprietary, it is generally understood that they are trained on large parts of the web.
But while the models may differ slightly, hallucinations are inherent in all models, not just Grok, Schwartz said.
Grok has multimodal capabilities, allowing you to respond to text inquiries and analyze videos. Schwartz said that while he can relate what he sees in the video to the explanation in the text, “I’m not trained to do any kind of fact checking, I’m just trying to understand what’s going on in the video and answer questions based on that.”
She added that the models are trained in common online discussion styles, which can double the number of incorrect answers. Some companies may customize their chatbot to sound more authoritative or be more respectful to users.
It’s becoming increasingly common for people to rely on these chatbots to verify the information they see online, which Schwartz said is a “concern.”
Internet users tend to anthropomorphize chatbots designed to imitate human language, leading to overconfidence in the ability of large-scale language models to verify information, Schwartz said.
“They’re used to humanizing[chatbots]so they say, ‘Oh, they must be confident because they doubled,'” she says.
“The premise that people will use[large language models]to do fact checking is flawed… There is no ability to do that.”
This report was first published by Canadian Press November 25, 2025.
Marissa Barney, Canadian Press

