Chinese team presents first text-to-video AI model comparable to Sora

AI Video & Visuals


Presentation of text-video AI model Vidu at 2024 Zhongguancun Forum on April 27, 2024 Photo: Provided by Zhongguancun Forum

Presentation of text-video AI model Vidu at 2024 Zhongguancun Forum on April 27, 2024 Photo: Provided by Zhongguancun Forum

Chinese technology company ShengShu-AI and Tsinghua University on Saturday unveiled the text-to-video artificial intelligence (AI) model Vidu. This is said to be the first of its kind in China, on a par with Sora, and is another sign of China's rapid development. An important emerging field of AI.

Announced at the ongoing Zhongguancun Forum in Beijing, Vidu can generate 16-second 1080P video clips with a single click. It is built on an in-house developed visual transformation model architecture called Universal Vision Transformer (U-ViT), which integrates his two text-to-video AI models: Diffusion and Transformer, the developer said. say.

The AI ​​text-to-video model comes just about two months after Sora, developed by US-based developer OpenAI, was released to great fanfare around the world.

“After the release of Sora, we found that Sora closely aligned with our technology roadmap, which gave us even more motivation to pursue research with determination,” said ShengShu, deputy director of the Institute of Artificial Intelligence at Tsinghua University. said Zhu Jun, Principal Researcher at . AI said this on the forum.

U-ViT's core technology was first proposed by Vidu's research team in September 2022, earlier than Sora's DiT – Diversity in Transformation model architecture. It is the world's first visual transformation model architecture that combines the benefits of Diffusion and Transformer. Regarding media coverage.

In Saturday's live demonstration, Vidu can simulate the real physical world and generate scenes with intricate details that follow real-world physics, including proper light and shadow effects and delicate facial expressions. Rather than static shots, you can also generate complex dynamic shots.

Additionally, media reports say that Vidu, which was developed in China, has a good understanding of Chinese elements and is able to generate images of unique Chinese characters such as pandas and Ron.

Global Times



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *