Netflix now also has its own video AI. • The Register

AI Video & Visuals


The new Netflix model promises to rewrite the way movies are made. Just imagine. You’re the director of the multi-million dollar blockbuster Car Crash III: Sudden Nest Impact, and you’ve just finished filming the finale, which throws the lead straight into cruise control in the semi-finals.

The collision is spectacular. Cruise’s car, which was operated by remote control, exploded on impact, sending debris flying onto the highway. That’s wonderful. You high-five and mumble beside him at the camera monitoring station as he heads off to the craft service truck, his career in the lucrative franchise coming to an end.

Producer Maya Cash grabs you by the shoulders. “You don’t want to hear this,” she says. “But what if Cruz rides off into the sunset? What if he doesn’t die after all?”

You stop and stare at her over the rim of your Balenciaga sunglasses. “After all, are they going to fund the fourth?”

Netflix’s VOID model was built for that moment. Instead of reshooting the scene or completely redoing it with computer graphics, you can simply transform the crash footage into a public road denouement.

VOID stands for Delete Video Objects and Interactions. This is a VLM (Vision Language Model) that can not only erase objects from the scene, but also repair how objects that are removed but remain in the scene behave unaffected.

For example, a head-on collision between two vehicles can be turned into a scene of one vehicle on the road by removing one vehicle and generating a video that shows a physically plausible path for the remaining vehicle. All post-impact debris, smoke, and flames will be erased and replaced with the original pavement.

The creators of the video model – Saman Motamed (Netflix/Sofia University), William Harvey (Netflix), Benjamin Klein (Netflix), Luc Van Gool (Sofia University), Zhuoning Yuan (Netflix), and Ta-Ying Cheng (Netflix) – describe VOID in a preprint paper. [PDF] as “a video object removal framework designed to perform physically plausible inpainting in these complex scenarios.”

You can delete objects and model how the remaining objects behave in the absence of the deleted object. So if you see a scene where a person jumps into a pool and splashes on the ground, VOID can remove that person and produce a video that shows there is no splashing in the pool or on the ground, making the pool appear undisturbed.

VOID isn’t just limited to Netflix titles. The company has published its model on Hugging Face, and anyone can install it.

Other tools for modifying videos include Runway, Generative Omnimatte, DiffuEraser, ROSE, MiniMax-Remover, and ProPainter. However, Netflix officials claim that VOID is significantly better than these alternatives. Based on a survey of 25 people across multiple scenarios, VOID was preferred 64.8% of the time, followed by Runway in second place with 18.4%.

“Through extensive evaluation against inpainting and text-guided video model baselines based on synthetic and real-world data, we have shown that VOID excels in modeling the complex dynamics following object removal,” the authors claim.

Whether the world really needs more convincing video manipulation is another question. ®



Source link