Google DeepMind, Google's AI research lab, recently announced V2A, a new model that can generate audio from video.

Video generation models such as Sora, Dream Machine, Veo, and Kling are rapidly improving, allowing users to generate videos from text prompts. However, the majority of these systems are limited to silent videos. Google DeepMind appears to recognize this problem, and is currently working on developing a new large-scale language model that can generate soundtracks and dialogue for videos.
In a blog post, the tech giant's AI research lab unveiled V2A (Video to Audio), a new AI model in development that combines video pixels with natural language text prompts to generate a rich soundscape for the on-screen action.
you
Monthly limit of free stories.
Read more stories for free
With an Express account.
Invest in democracy: Get full access to Express for just Rs 999 per year.
This premium article is free for now.
Sign up to get access to more free articles and offers from our partners.
Invest in democracy: Get full access to Express for just Rs 999 per year.
This content is exclusive to subscribers.
Subscribe today and get unlimited access to exclusive and premium articles from The Indian Express.
V2A is compatible with Veo, the text-to-video conversion model the company announced at the recently concluded Google I/O 2024, and can be used to add dramatic music, realistic sound effects, and dialogue that matches the mood of a video. Google says the new large-scale language model can also be used with “traditional footage,” such as silent films and archival material.
The new V2A model can generate “an unlimited number of soundtracks” for any video, features optional “positive prompts” and “negative prompts” so you can tailor the output to your preferences, and also watermarks the generated audio with SynthID technology.
DeepMind's V2A technology takes audio descriptions as input and uses a diffusion model trained on a combination of audio, transcripts, and videos. The model hasn't been trained on many videos, so the output can be distorted. Google also said it won't open V2A to the public anytime soon to prevent it from being misused.
© IE Online Media Services, Inc.
First uploaded: 18 Jun 2024 17:10 IST

