A Chinese startup has unveiled an artificial intelligence-powered system that can generate up to 16 seconds of high-definition video, marking a major breakthrough for China's AI industry as it races to catch up with American giants.
Shengshu-AI, a Beijing-based startup founded just last year, unveiled the new system, which the company named Vidu, on Saturday at the Zhongguancun Forum in Beijing, calling it China's first long-term continuous and consistent system. He explained that it is a system with high quality. A highly dynamic video generation model. ”
Many in China were quick to think of Vidu China as the answer to Sora, the text-to-video model created by OpenAI that shocked the world when it was announced in February. Masu.
As of now, it seems that Vidu is still far from being able to match Sora's abilities. According to Shengshu-AI, Vidu can generate high-definition videos of up to 16 seconds, while Sora can generate 60-second clips of her.
But this still puts Vidu at the forefront of the rapidly evolving AI-generated content space. Most of the major text-t0-video models, including Pika and Gen-2, only produce clips of up to 4 seconds.
Unlike these models, Vidu is not yet available to the public, and Shengshu-AI has not yet confirmed when it will be officially launched. However, the company conducted a live demonstration of the system at the forum and said it is open to working with partners to further fine-tune the technology.
Shengshu-AI is one of the many startups that have emerged amid the AI-related investment frenzy in China since the release of OpenAI's ChatGPT in late 2022.
The company was founded in March 2023, and Zhu Jun, a leading AI researcher from Beijing's prestigious Tsinghua University, joined as a principal researcher. It has since raised more than 100 million yuan ($14 million) from investors including Chinese tech giant Ant Group and Baidu.
At the Zhongguancun Forum, Zhu said Vidu complies with the laws of physics and can generate scenes with rich details such as realistic shadow effects and facial expressions.
In another nod to Shengshu-AI's ambitions to compete with OpenAI, a subsequent live demonstration of Vidu included a video almost identical to the one used to launch Sora: a clip of a car driving down a mountain road. used.
The key technology behind Vidu is Universal Vision Transformer, which combines two AI models: Transformer and Diffusion. This is similar to Sora's Diversity in Transformation architecture, but Shengshu-AI claims that its research team developed the system before his OpenAI and published a related paper in September 2022.
“After releasing Sora in February, we saw a high degree of alignment between our technology roadmaps and became even more determined to further our own research,” Zhu said at the forum. I did.
The release of Sora earlier this year surprised many in China, as the technical challenges involved in generating AI video far exceed those associated with creating text or still images. The hashtag “Sora” gained him over 100 million views on the Chinese microblogging platform Weibo within a week of the product's launch.
There were concerns within China's AI industry that Sora's launch signaled a growing gap between Silicon Valley and China. However, Shengshu-AI is bullish on its ability to catch up with the US market leader.
As of February, Vidu was reportedly only able to generate 4-second clips, but that has quadrupled in just a few months. Tang Jiayu, CEO of Shengshu-AI, told domestic media in March, “It's true that this model can reach the level of Sora this year, but will it take three months or six months?'' It's difficult to say,” he said.
With the demonstration of Vidu, Shengshu-AI has proven itself as a leader in China's AI sector, Chen Chen, a partner at consultancy firm Analysys, told domestic media. Still, Chen added, Sora is still far ahead in terms of video length, variety and richness.
China's technology industry continues to invest heavily in AI content generation. Key AI models such as ChatGPT, Stable Diffusion, and Midjourney are not available in China, leaving a large gap in the market for domestic companies to fill.
In recent months, large technology companies such as ByteDance, Kuaishou, Tencent, and SenseTime, as well as a number of smaller companies, have reported progress in the development of text-to-video AI tools. But some companies stress that their products are still in the early stages.
According to market research firm iResearch, the value of China's AI-generated content market is expected to grow at 87% annually over the next 10 years, twice as fast as the global market.
(Header image: Shengshu Technology and Tsinghua University unveil Vidu, a text-to-video model, at the 2024 Zhongguancun Forum in Beijing, April 27, 2024. CNS)
