Realistic AI media in 2025 and challenges in 2026

By 2025, deepfakes have improved dramatically. AI-generated faces, voices, and full-body performances that mimic real humans have improved in quality far beyond what even many experts expected just a few years ago. It is also increasingly being used to deceive people.

In many everyday scenarios, especially low-resolution video calls and media shared on social media platforms, the level of realism is high enough to reliably fool non-expert viewers. In fact, synthetic media has become indistinguishable from real recordings to the public and, in some cases, institutions.

And this surge is not limited to quality. The amount of deepfakes is exploding. Cybersecurity companies predict that online deepfakes will increase from about 500,000 in 2023 to about 8 million in 2025, an annual growth rate of nearly 900 percent.

I'm a computer scientist and research other synthetic media. From my perspective, the situation is 2026, where deepfakes become synthetic performers that can react to people in real time.

dramatic improvement

Several technological changes underlie this dramatic escalation. First, video realism has advanced significantly thanks to video generation models designed specifically for . These models produce videos with consistent movement, consistent identities of the people depicted, and meaningful content from frame to frame. The model disentangles information related to the expression of a person's identity and information about movement, so that the same movement can exist or the same identity can have multiple types of movement.

These models produce stable, consistent faces without flickering, distortion, or structural distortions of the eyes or jawline that once served as reliable forensic evidence for deepfakes.

Second, voice cloning exceeds what I call the “indistinguishability threshold.” A few seconds of audio is enough to produce audio with natural intonation, rhythm, emphasis, emotion, pauses, and breathing noises. This feature is already facilitating large-scale fraud. Some major retailers report the number of items arriving per day. Perceptually, we can see that the synthesized speech that was once available has all but disappeared.

Third, consumer tools have pushed technological barriers to near zero. A wave of upgrades and startups from OpenAI and Google has made it possible for anyone to write ideas and create scripts in large language models such as OpenAI's ChatGPT and Google's Gemini. AI agents can automate the entire process. The ability to generate consistent, story-driven deepfakes at scale has effectively been democratized.

The combination of this burgeoning volume and personas that are nearly indistinguishable from real humans poses a serious problem, especially in a media environment where people's attention is fragmented and content moves faster than it can be verified. Deepfakes that went viral before people realized what was happening are already causing damage in the real world.

The future is real time

Looking ahead, next year's trajectory is clear. Deepfakes are moving toward real-time synthesis that can produce videos that closely resemble the nuances of human appearance, making it easier to evade detection systems. The frontier is moving from static visual realism to temporal and behavioral consistency, i.e. models rather than pre-rendered clips.

Identity modeling is converging into integrated systems that capture not just what a person looks like, but who they are. The result goes beyond “this looks like person X” to “this behaves like person X over time.” I would expect the entire video call participants to be composited in real time. Interactive AI-driven actors whose faces, voices, and mannerisms instantly adapt to your prompts. Some scammers deploy responsive avatars instead of static videos.

As these capabilities mature, the perceptual gap between synthetic and real human media will continue to shrink. A meaningful line of defense moves away from human judgment. Instead, it relies on infrastructure-level protection. These include secure origins such as cryptographically signed media and AI content tools that use that specification. I also rely on multimodal forensic tools such as my lab's .

It's no longer enough to just look closely at pixels.

professor of computer science and engineering. UB Media Forensic Lab Director,

This article is republished from Under Creative Commons License. Please read.

Source link