Alibaba introduces ThinkSound: AI models that generate realistic audio for videos

Released on July 16, 2025

Creating high quality audio for video content presents many technical and creative challenges and affects both beginners and experienced audio professionals. Producers often tackle issues like noise management, balancing dialogue and sound effects, meeting budget and time constraints, and maintaining creative consistency. Transform your artistic vision into a cohesive final product that accurately reflects visual dynamics, acoustic environment and timing.

To address these challenges, Introducing Alibaba's Tongyi Speech Lab ThinkSound, a new open source multimodal LLM that utilizes Chain of Inference for Advanced Audio Generation and Editing (COT). ThinkSound offers a structured, interactive approach to audio production that is specifically tailored to video content. Model available in 3 compact sizes – 1.3b, 724m, and 533m parameters – Supports audio generation from video, text-based audio editing, and interactive audio creation, even on edge devices.

https://www.youtube.com/watch?v=2kr4z9o6srk

ThinkSound mimics the multi-stage workflow of human sound designers and ensures that the generated audio remains contextually accurate, cohesive and high quality throughout production. This model first analyzes the visual dynamics of the video, logically interprets the corresponding acoustic attributes, and then synthesizes audio suitable for the context.

Through an innovative approach, ThinkSound allows users to create detailed and consistent soundscapes, refine audio generated through intuitive user interaction, edit specific audio segments using natural language instructions, effectively filling the gap between creative intent and automated audio production.

Furthermore, Alibaba research team introduced it Audio Cota large multimodal dataset with audio-specific COT annotations that enhance alignment between visual content, text descriptions, and sound synthesis.

Large-scale evaluations have demonstrated that ThinkSound achieves modern performance in video-to-audio generation, providing a contextually accurate, accurately timed soundscape. This model excels in traditional audio quality metrics and COT-based evaluations. Furthermore, with the MovieGen Audio Bench, a benchmark that evaluates the audio generation capabilities of movies, ThinkSound is significantly better than other major models.

Thinning Sound 1 — *Comparison of ThinkSound Foundation models from existing videos in the vggsound test set with audio baselines. ↓ indicates better and higher than ↑.*

ThinkSound can seamlessly integrate with a variety of video generation models to provide realistic narration and soundtracks for synthesized videos. Sleek audio generation features provide critical potential applications in movie and television sound design, audio post-production, immersive sound experiences in gaming and virtual reality.

ThinkSound is now open source available on Hugging Face, Github and Alibaba's Model Studio.

ThinkSound 2 — *Dispersive evaluation on the MovieGen audio bench.*

Source link

b"asta binance h"anvisningskod commented on Hiring platform Uplers ups the ante; claims to have created two pronged strategy for workforce : I don't think the title of your article matches th
创建个人账户 commented on WestMetric Defends Controversial On-Page SEO Services for the Era of AI: Your article helped me a lot, is there any more re
Registro commented on Security Architect | eFinancialCareers: Thanks for sharing. I read many of your blog posts
Anm"al dig f"or att fa 100 USDT commented on Best ChatGPT Tips and Tricks shared by ChatGPT Experts: Turbo-Charge Your AI Experience: Prompts included | by Michael King | Oct, 2023: Thanks for sharing. I read many of your blog posts
Elizabeth Nash commented on AI platform Hugging Face says hackers have stolen authentication tokens from Spaces: 🌍 Global crypto mining is now at your fingertips h

Alibaba introduces ThinkSound: AI models that generate realistic audio for videos

Leave a Reply

RECENT POSTS

Humanoid releases KinetIQ Ascend, a reinforcement learning system for industrial robots

Artificial intelligence: Australian workers admit AI habits that could get them fired

AI gaffe video falsely claims Trump expelled 58 Canadian diplomats

Related Posts

Leave a Reply