Fine-tuning AI video model draws early interest from film and TV studios

AI Video & Visuals


Media and entertainment companies are now looking to fine-tune video generation models to create custom model versions for their own internal use, including for specific productions.

Fine-tuning refers to the process of training a pre-trained AI model on a curated dataset to create a smaller new model that can produce a more specific type of output. Fine-tuning in image and video generation is rarely discussed or well understood, as LLM fine-tuning for language (text) is more common in the enterprise.

These tweaks, which “off-the-shelf” video generation models can't do, are expected to enable studios to create entirely new “footage” — sophisticated VFX-like or camera-like shots that more aesthetically match the look of a particular film. For example, if a model is trained on the “Star Wars” films, it might produce output that matches the franchise's worlds, such as the deserts of Tatooine, where Anakin Skywalker was born.

Runway is currently in the early stages of working with corporate clients, including film and TV studios and media and advertising agencies, to customize or fine-tune its latest video model, Gen-3, said Cristobal Valenzuela, the company's CEO and co-founder.

“I think companies and studios that have been put off by the inability of the models to fully generate hyperrealistic content are realizing that those concerns are being addressed,” Valenzuela says, “so they're coming back.”

Runway's blog post announcing the model also mentioned “industry customization,” working and partnering with entertainment and media organizations to create custom versions of the Gen-3. The company released an alpha version of the Gen-3 last month; the full model version will be released later this year and is expected to perform much better in various benchmarks.

For now, it appears that Runway is the only video model development company that has started offering tweaks to companies. It's possible that other companies that develop their own video models will also start offering tweaks at the corporate level.

That capability is perhaps most relevant for OpenAI, which has been talking to Hollywood studios and creators and testing Sora. Pika said in a conversation with VIP+ in May that tweaks are “on the table.” Luma AI co-founder and CEO Amit Jain said the company will consider tweaks as a feature for Dream Machine, but will decide whether it's needed based on user feedback.

Runway's Valenzuela said Gen-3 customization will be offered separately for enterprise customers who have large amounts of data they can use to train their own versions of the models.

He said some of the companies looking to customize the Gen-3 have a specific project in mind, while others are looking for a more generic model for ongoing in-house use, where “they can choose how they want to use it in conjunction with their existing pipeline.” Valenzuela said he expects customized Gen-3s will be used for new productions, but declined to elaborate on what those might be.

For studios, customizing video models delivers a crucial aspect for studios looking at generative AI as an internal production tool.

“Studios want privacy, quality and control, and they want to be able to use their own IP. Quality has improved by leaps and bounds, but the crazy control that filmmakers sometimes want is still not there,” Pinar Demirdag, co-founder and CEO of Cuebric, told VIP+ in April. Cuebric allows studios to tweak image generation models (such as Getty's Licensed Dataset and Stable Diffusion) to create local (offline) model versions, as VIP explained in a June special report.

The first benefit, privacy, is achieved because no one has access to the publisher’s customized model. For the same reason, a proprietary tweaked model can be a competitive advantage for the studio.

Second, fine-tuning gives us more creative control, which is essential to counteract the “slot machine” effect in generating video from text, which is a known problem. Instead, the output from a fine-tuned model will be more stylistically consistent with the IP, matching the specific aesthetic present in the footage it was trained on.

“If you train on movies from the 2020s and movies from the 1950s, you're going to get very different results in terms of film grain, lighting and camera angles,” Valenzuela said.

The reason fine-tuning results in more style-specific output is because the fine-tuned model prioritizes new data over the original training of the base model. For studios, this is likely a desired effect and may also minimize legal risks (discussed below).

But it also means that fine-tuning reduces the performance of the base model, Jain said, which means the model's capabilities become narrower, making it even more important to use the data used for fine-tuning correctly.

“Tweaking is not a solved technique. … Imagine you are using a model for a film and you just want to generate assets in the style you want. But you have to accept that the model is now a different model and will only serve this specific purpose,” Jain added.

Runway has a dedicated data partnerships team that works closely with studios in some cases to help prepare datasets for training (e.g., determine what data is available). Studios have vast archives of content to consider or target for training, including material that has not been digitized and is sitting on a back shelf.

“Someone recently sent us a hard drive of content,” Valenzuela says, “and preparing the dataset would be the process of digitizing it and helping to annotate and label it.”

Data annotation is the necessary process of adding labels or captions to help an AI model interpret the content of an image or video. This is especially important when the data provided for fine-tuning is unusual and not something the model has seen before.

Preparing a dataset for fine-tuning a video generation model would initially seem to raise legal or contractual questions about what data can be packaged and used for training, especially since film footage features many actors with similar images.

But judging by the SAG-AFTRA contract, studios may not actually be obligated to disclose the tweaks or the specific data being used. Companies also may not have to restrict which and how many of their films or episodes can be used for training.

SAG-AFTRA's contract language on AI addresses the output of an AI model that affects an actor's performance. Broadly speaking, informed consent and compensation are required only if the AI ​​is used to replicate or alter an actor's performance in a specific commercially distributed project. Informed consent is only required if there are plans to use a model in the visuals of a specific actor.

“i don't think so [the agreement] “It says so much about this type of training protocol and what you can and can't do,” said Simon Pullman, partner and co-chair of Pryor Cashman's Media + Entertainment and Film, TV + Podcasts groups.

“Almost every entertainment contract written in the last 30-40 years contains language stating that all materials, results and revenues will be owned by the studio on a 'work for hire' basis and may be used in all media now known and hereafter devised,” Pullman says. “Clearly, these contracts did not contemplate such uses of AI at the time they were negotiated, and therefore the contracts do not specifically mention AI, and therefore the use of AI is likely permitted on the face of the contract.”

Either way, finely tuned models may be more likely to be tasked with creating non-actor visuals, such as virtual backgrounds or expensive CGI shots, a role usually reserved for VFX.

It's one thing to train a model, but another to use it. The reality is that studios will be taking on some legal risk if they actually use these models in production. Although fine-tuned models deprioritize non-owned material that may be present in the base model they are tweaked on, fine-tuning is not a panacea for inadvertent copyright infringement that may appear in the output. This is also explained in VIP+'s June report. While we know virtually nothing about what Sora, Gen-3, or any other models were trained on, it is highly unlikely that there will ever be a video generation model that completely excludes copyrighted material.

Owning the fine-tuning data does not mean you have copyright in the output of the fine-tuned model, for the simple reason that AI output comes from a machine and is not registerable under current guidelines in the U.S. Copyright would still likely be protected if the model was trained on an entirely new, original creative work, such as camera footage from your project.

For now, media companies may be thinking of the tweaked models they're building as early experiments, opportunities to test and learn about the technology's capabilities in pursuit of some cost savings and competitive advantage.

Variety VIP+ Explores the AI ​​Generation from Every Angle — Pick the Stories



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *