AI chatbots struggle to detect AI-generated videos

AI Video & Visuals


Leading artificial intelligence chatbots, such as OpenAI’s ChatGPT, Google’s Gemini, and xAI’s Grok, are largely unable to identify AI-generated videos, according to a new report from media watchdog NewsGuard.

The report, released on January 22nd, found that chatbots failed to recognize videos created using OpenAI’s text-to-video conversion tool Sora 78% to 95% of the time, raising concerns about the reliability of AI tools for validating visual content online.

NewsGuard used 20 videos generated by Sora to test three chatbots based on clearly false claims extracted from the False Claims Fingerprints database. Videos were evaluated both with and without visible watermarks using prompts such as “Is this real?” “Is this generated by AI?”

Read: Top AI chatbots can be easily manipulated to spread health misinformation – Report

According to the report, Grok failed to identify unwatermarked Sora videos as AI-generated in 95% of tests, ChatGPT failed in 92.5%, and Gemini failed in 78%. ChatGPT’s results were particularly striking given that both are products of OpenAI.

According to NewsGuard, OpenAI did not respond to questions about why ChatGPT had trouble recognizing videos generated by its technology.

Sora’s videos are watermarked by default, but NewsGuard noted that the safety measures can be easily circumvented. Shortly after the release of Sora in February 2025, several third-party tools emerged offering free watermark removal services. NewsGuard used one such tool to remove watermarks before testing the videos.

Even if the watermark was present, the chatbot could not be completely trusted. Grok failed to identify a watermarked Sora video as AI-generated 30% of the time, while ChatGPT failed in 7.5% of cases. Gemini correctly identified all watermarked videos during testing.

In some instances, chatbots confidently vouched for fabricated videos. NewsGuard cited the example of ChatGPT and Gemini describing a fake Sora video purporting to show U.S. immigration officers arresting a 6-year-old child as consistent with real news events. In another case, all three chatbots supported the authenticity of a fake video that purported to show a Delta Air Lines employee ejecting a passenger wearing a political hat.

The report also found that AI tools rarely make their limitations clear to users. ChatGPT admitted that it could not detect AI-generated content in just 2.5 percent of its tests, while Gemini and Grok were able to detect it in 10 percent and 13 percent, respectively. Instead, the model often gave confident but inaccurate ratings.

In response to a NewsGuard inquiry, Nico Felix, OpenAI’s head of product and application communications, acknowledged that “ChatGPT cannot determine whether content is generated by AI,” but did not explain why this restriction is not consistently communicated to users.

Google said Gemini’s verification tools currently only apply to content generated using Google’s own AI system. “At this time, we are only announcing validation of Google AI-generated content,” Elijah Lawal, Gemini communications manager, told News Guard.

NewsGuard warned that the findings highlight the growing risks as AI-generated videos become increasingly realistic and widely shared online, especially during elections, conflicts and breaking news. The group pointed out that relying on chatbots to verify content could unintentionally amplify misinformation instead of suppressing it.


Nurudeen Akewushola is a fact checker at FactCheckHub. He has authored several fact-checks that have contributed to the fight against information failure. He can be reached at nyahaya@icirnigeria.org or on Twitter @NurudeenAkewus1.



Source link