Vidu: China unveils innovative text video generator to take on rival OpenAI's Sora : Tech : Tech Times

With the launch of this text-to-video generator, Shengshu Technology and Tsinghua University demonstrate their commitment to pushing the boundaries of AI technology.

This partnership highlights the growing importance of AI R&D in China and its potential impact on various industries around the world.

(Photo: Steve Johnson on Unsplash)

Next steps for China's AI innovation

Vidu, a joint venture between Shengshu Technology and Tsinghua University, represents an important milestone in China's AI innovation journey.

This collaboration brings together the expertise of technology startups and acclaimed academic institutions to create a cutting-edge text-to-video generator.

Vidu's announcement at the Zhongguancun Forum in Beijing highlighted it as a notable competitor to OpenAI's Sora.

Unlike Sora's long 60-second video capabilities, Vidu allows users to generate short but high-resolution 16-second video clips with one click, Interesting Engineering reported.

Although Vidu's capabilities may seem limited compared to Sora, its introduction marks a significant step forward in China's AI technology landscape.

As China continues to invest in AI research and development, Vidu embodies China's commitment to innovation and technological advancement.

Zhu Jun, chief scientist at Shengshu and deputy director of Tsinghua AI Research Institute, said Vidu is a significant advance in autonomous innovation and boasts breakthroughs in various areas.

Vidu features imaginative capabilities, the ability to simulate the physical world, and the ability to generate 16-second videos with consistent characters, scenes, and timelines.

Additionally, Zhu emphasized Vidhu's proficiency in understanding “Chinese elements.” During the model's debut, Shengshu Technology gave several demonstrations, including scenarios such as a panda playing a guitar on the grass and a puppy swimming in a pool.

Advances in Vidu's architectural framework

Vidu is built on a unique vision transformation model architecture called Universal Vision Transformer (U-ViT). The developer shows that this architecture combines his two text-to-video AI models, Diffusion and Transformer.

Additionally, this architectural framework facilitates the creation of lifelike videos with dynamic camera movements, complex facial expressions, and authentic lighting and shadow effects.

Zhu said implementing Sora resonated with their technological direction and strengthened their resolve to continue their research efforts.

Also read: Sora's new realistic AI-generated video means we can't trust our eyes anymore

In contrast to many Chinese versions of OpenAI's ChatGPT, which debuted in November 2020, Chinese competitors have only recently caught up with Sora's capabilities.

Industry experts attribute this delay to a significant challenge: a lack of computing power among Chinese companies.

According to Li Yanwei, a Beijing-based technology consultant specializing in intelligent computing, running Sora requires eight NVIDIA A100 graphics processing units (GPUs) to generate a one-minute video clip. requires more than 3 hours.

Yangwei points out that Sora requires extensive computing power for inference.

Related Article: Google Introduces Lumiere: A Revolutionary New Text Video Generator Powered by AI

Source link