Baidu's MuseStreamer AI Video Generation model assumes Google's VEO 3 with native audio support: Report

Baidu reportedly released a new AI (AI) video generation model on Wednesday. According to the report, the MuseStreamer AI model can also integrate Chinese audio into the generated video, making it the second model after Google's VEO 3. The technology giant claims it is the world's first AI model with support for native Chinese audio generation. In addition to introducing a large-scale language model (LLM), the company reportedly has also launched a new video content creation platform called Huixiang. In particular, neither MuseStreamer nor Huixiang can be used outside of China.

Baidu's MuseStreamer is reportedly capable of generating Chinese audio

The world of AI video generation models has evolved significantly over the past two years. We moved from a model that struggled to generate people with fixed numbers of fingers into LLMS. This allows for accurate portrayal of realistic physics and movement. However, one area most AI players have refrained from inputting is video that natively supports audio.

At Google I/O 2025, Tech Giant became the first company to offer this ability in VEO 3, quickly talking about town and leaving his biggest rival, Openai's Sora in the back. The Mountain View-based tech giant recently expanded its VEO 3 in all 154 countries where the Gemini app is available, highlighting the company's aggressive driving force for the tool.

However, according to an Asian Technology Report (via AI-based), Chinese high-tech giant Baidu also raced on the Musestream AI model. It is said to generate the only model that generates videos on Chinese audio and has the ability to do so. In particular, VEO 3 can generate audio only in English.

MuseStreamer not only generates interactions that are synchronized with the video, but also allows you to add sound effects and ambient noise to your video. Baidu is said to have claimed that the model achieved a score of 89.38% on the VBench I2V benchmark, ranking it at the top. The tech giant is pitching LLM as a consumer content creation tool.

In addition to the AI model, Baidu reportedly has also launched a new video content platform called Huixiang. Huixiang is said to serve as a front-end for AI models that allow users to share prompts and generate videos. The platform currently supports a 10-second video generation at 1080p resolution, the report says. In comparison, VEO 3 can only generate videos that are 8 seconds long. There is no clear clear default aspect ratio for videos. If users can generate videos with different aspect ratios.

Source link