The fact that we can now create our own clips using AI-generated generators like Luma AI's Dream Machine and OpenAI's Sora is both terrifying and fascinating.
While AI-created videos are entertaining to watch with their random themes and styles, they often have a flaw: They seem to lack good quality audio for the platform to use.
Luckily, Google's DeepMind has the tools to solve this problem.
Introducing Google DeepMind's Video to Audio Conversion Tool
(Photo: Jeremy Yap on Unsplash)
We'll soon see a proliferation of AI-generated movies on the internet, and it all boils down to having a set of soundtracks that Google DeepMind tools can generate.
According to TechRadar, Google DeepMind has unveiled a groundbreaking video-to-audio (V2A) tool that leverages pixels and text prompts to create soundtracks and soundscapes for AI-generated videos, a development that marks a major step towards fully automated cinematic scene creation.
V2A technology can work with AI video generators like Google's Veo to create atmospheric scores, sound effects, and dialogue that match the character and tone of your video. This can be a bit tricky, but with the right audio choices, it can be a powerful tool for the next generation of cinema.
Related Articles: Runway's Gen-3 Alpha AI Video Generator gives you greater control and creates faster, higher-fidelity video clips.
Unlimited soundtrack generation via text prompts
It's difficult to insert sounds that don't match the main theme of a movie, and Google's research AI lab has proven that its V2A tool can achieve amazing results beyond what creators can imagine.
DeepMind's V2A tool has the ability to generate an unlimited number of soundtracks for any video input, allowing creators to tweak the audio output using simple text prompts, providing greater creative flexibility.
Unlike its competitors, this tool can only generate audio from video pixels, making text prompts optional rather than required.
Ensuring safety and preventing misuse
DeepMind recognises that this powerful technology could be misused to create deepfakes.
As a precautionary measure, the V2A tool is currently for research purposes only. DeepMind plans to conduct rigorous safety assessments and extensive testing before releasing it to the public. This cautious approach is intended to mitigate risks and ensure responsible use of the technology.
Huge potential for filmmaking and animation
The potential applications of V2A tools are very wide ranging, especially in amateur filmmaking and animation.
Examples include a Blade Runner-esque scene with electronic music and a cartoon with baby dinosaurs, proving that the tool can significantly reduce production costs.
Although there are some limitations, especially when it comes to interaction, the technology is very likely to be improved in the future.
The Future of AI-Generated Media
Integrating AI-generated video with AI-created soundtracks and sound effects is an incredible leap for the industry.
OpenAI has announced plans to add audio to its Sora video generator, due for release later this year. Speaking of which, director Paul Trillo released the first music video made with Sora back in May.
DeepMind's V2A tool has already demonstrated advanced capabilities, generating audio based solely on video content without detailed instructions.
How the V2A tool works
DeepMind's V2A tool employs a diffusion model that synthesizes video pixels and information from the user's text prompts to produce compressed audio, which it then decodes into an audio waveform.
While details about the training data are unclear, Google's access to YouTube is a big advantage: Some YouTube creators have deals that allow them to use their content to train the AI models, potentially contributing to the development of the tool.
A game changer for content creation
While there are still challenges to overcome, especially when it comes to producing Hollywood-quality dialogue, DeepMind's V2A tools will be a powerful asset for storyboarders and amateur filmmakers.
Given the rapid advancements and fierce competition in the AI field, we expect these tools to continue improving and offer even greater capabilities in the near future.
In other news, Chinese short video app Kuaishou has launched an AI-powered video generator that can create 1080p videos up to two minutes long.
Related article: Luma's AI video generator, the dream machine, is flooded with users, surpassing OpenAI's Sora
ⓒ 2024 TECHTIMES.com All rights reserved. Please do not reproduce without permission.
