AI videos will be real

Will Smith eating spaghetti has become the oddest success story of technology.

In 2023, the AI-generated video of actor Roar Luring Pasta went viral for all the wrong reasons. The clip, created by an early AI model called Modelscope, showed a nightmare figure who vaguely resembles impossible hand movements and distorted face Smith Grotesque noodles. It was obviously fake and unsettling, so Smith himself parodied it almost a year later, turning AI failures into memes.

That horrifying pasta clip has since become an unofficial benchmark for AI video progress. This is a standard test that developers and researchers use to measure technology advancements. This is the equivalent of asking a chatbot to get an LSAT or to solve a math problem.

Until last month, when Google unveiled the latest version of the latest text-to-video model, the Veo 3, it was possible to generate a compelling Will Smith Doppelganger. The only problem? AI believes spaghetti makes crunch noises, like eating potato chips. From the digital horror show, there are only a few audio quirks, revealing how far they traveled within two years.

The persuasive journey from spaghetti nightmare to deepfark happened through a series of rapid breakthroughs in 2024. The open Sora, released earlier this year, could produce smooth and film footage, but remained silent – essentially high quality GIFs. Meta's film gen brought better character consistency with longer clips. Google's VEO 2 has been improved on both, but still failed to produce sound. Although each model represents incremental progression, no one prepared observers for VEO 3 synchronous audio, realistic dialogue, and sudden integration of surrounding sound effects.

This is not a steady march of technological advancements we are used to. It's a cliff jump that left experts, filmmakers and society scrambling to understand what happened. The sudden leap into synthetic content that is barely distinguishable from obviously fake AI videos represents one of the most dramatic ability jumps in recent technological history.

One place that has been accepted is Hollywood. Media executives, who were nervously sitting at conference audiences taking notes on AI experiments several years ago, are now debating publicly about the active use of these tools. Amazon Studios recently spoke openly about integrating generative AI into a creative pipeline, marking what one industry insider has called the “A Jesus moment,” making technology useful to the point that it can't be ignored. The shift makes sense. When daily shooting costs in Los Angeles reach $200,000 and traditional VFX homes are closed, AI is not just an innovation, it's survival.

However, there is no real confusion in the studio meeting rooms. It lies in the complete democratization of sophisticated video operations. What once required a VFX artist, expensive software, and a Hollywood budget team can now be achieved by anyone with $1.50 and an internet connection. The Veo 3 pricing structure essentially creates compelling fake videos within reach of everyone, breaking down barriers that previously served as a natural safeguard against widespread media manipulation.

The threat had already made the images come true. Since 2023, Tom Hanks has repeatedly warned his Instagram followers about videos that were misused to promote miraculous cures and mysterious drugs, using his likeness to misuse. The Department of Homeland Security has identified deepfakes as “an increase in threats,” noting that there is no need to make any particular progress in making synthetic media effective. This latest leap in video quality only accelerates the problem and makes deceit notifications cheaper, faster and more accessible.

Technology still shows limitations. While the demonstrations of circulating viruses online look perfect, deeper experiments reveal that VEO 3 is fighting consistency and often ignores prompts completely. The best models include guardrails that can't make videos showing recognizable people. However, the pace of progress suggests that even current quirks will soon become obsolete. And there is a way to dismantle guardrails, leaving behind functionally indistinguishable AI content that is indistinguishable from reality.

The question is not whether we can trust what we see and hear anymore, but whether we can trust who is showing us. In an age where sophisticated video operations are less than coffee, reliability is fixed to messengers rather than medium. The sudden maturation of AI video technology has forced many to rebuild trust systems that they have assumed to be a decade-long social adaptation, compressed into a crisis of immediate verification.

– Jackie Snow, Contributor Editor

Source link