Google is working on AI soundtracks and dialogue generation for videos

AI Video & Visuals


We all know that sound is a key element in most films and videos. After all, even in the days when movies were silent, there were musical accompaniments to convey emotions to the audience.

This law of nature also applies to the eerily quiet new generative AI videos. It's one of the reasons Google is working on video-to-audio (V2A) technology that “enables synchronized audio-visual generation.” On Monday, Google's AI lab, DeepMind, announced its progress in generating audio, including soundtracks and dialogue, that automatically match AI-generated videos.

Google has been hard at work developing multimodal generative AI techniques to compete with rivals. OpenAI has an AI video generator, Sora (not yet publicly available), and GPT-4o, which creates AI voice responses. Companies like Meta and Suno are exploring AI-generated audio and music, but combining audio and video is relatively new. ElevenLabs has a similar tool that matches audio and text prompts, but DeepMind says V2A is different in that it doesn't require text prompts.

Mashable Lightspeed

reference:

Luma AI Dream Machine: What it is and how to try it

V2A can be combined with AI video tools like Google Veo, as well as existing archival footage and silent films. It can be used for soundtracks, sound effects, and even dialogue. It uses diffusion models trained on visual input, natural language prompts, and video annotations to gradually tune random noise into audio that matches the tone and context of the video.

Google DeepMind said V2A “can understand raw pixels,” so text prompts aren't actually necessary to generate speech, but they can help improve accuracy. The model can also prompt to make the tone of the speech more positive or negative. Along with the announcement, DeepMind released a few demo videos, including a dark and spooky hallway with horror music playing, a lonely cowboy at dusk accompanied by gentle harmonica sounds, and an animated person talking about dinner.

V2A will include Google's SynthID watermarking as an anti-abuse measure, and Deepmind said in a blog post that the feature is currently being tested before being released to the public.

topic
Artificial Intelligence Google





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *