① HappyHorse has been launched internally on Alibaba’s Bailian platform and will be officially released within a week. ② The birth and rapid rise of Alibaba’s HappyHorse not only means the further optimization and maturation of video generation technology, but also implies that the competition among domestic video generation models is becoming more intense.
Science, Technology and Innovation Commission Daily, April 10 (Reporter Huang Xinqi) HappyHorse, a recently trending video generation model, has registered an account on an overseas social platform, and its first follower is Alibaba Group. Science and Technology Innovation Board Daily has learned from sources that HappyHorse has been launched internally on Alibaba’s Bailian MaaS platform and is expected to be officially released within a week.
On April 7th, this anonymous model named HappyHorse-1.0 reached the top of the Artificial Analysis Video Arena leaderboard with a score of 1333 Elo points, emerging as a dark horse. In four categories: text-to-video generation (with audio/no audio), image-to-video generation (with audio/no audio), HappyHorse ranked first in ELO scores, surpassing popular video generation models such as Seedance 2.0 launched by ByteDance, SkyReels V4 by Kunlun Wanwei, and Keling AI 3.0.
As of April 9, HappyHorse scored 1383 points in the text-to-video generation (no audio) category, 110 points ahead of second-place Seedance. In the Image-to-Video Generation (No Audio) category, HappyHorse achieved an impressive score of 1413 points, setting a new record on this list.
HappyHorse 1.0 is reported to be the world’s first open-source large-scale video model with native support for audio and video co-generation, with 15 billion parameters and a 40-layer integrated self-attention Transformer architecture. A single H100 takes approximately 38 seconds to generate a 5-second 1080p video, natively supports lip-syncing in seven languages: English, Mandarin, Cantonese, Japanese, Korean, German, and French, and has the lowest word error rate of any similar open-source model.
Previously, there were reports suggesting that HappyHorse was developed by a team led by Zhang Di from Future Life Lab, part of Alibaba subsidiary Tao Tian Group. Future Life Lab is now independent from Taotian Group and integrated into the newly created ATH Business Group in the AI Innovation Division.
On March 16th, the ATH (Alibaba Token Hub) business group was announced by Alibaba CEO Wu Yongming and placed under his direct leadership. It integrates five major divisions: Tongyi Lab, MaaS Business Line, Qwen Division, Wukong Division, and AI Innovation Division, and integrates Alibaba’s core AI resources with the core purpose of “token creation, token distribution, and token application.”
On April 8, Alibaba further established a group technical committee chaired by Wu Yongming, and upgraded the Dogi Research Institute to the Dogi Large Model Department, with Zhou Jingren currently in charge. According to Alibaba, these organizational adjustments are aimed at consolidating superior forces and resources to focus on the most critical battlefields, and signal that Alibaba has entered a phase of comprehensive AI acceleration.
The AI video generation industry is currently at a critical juncture towards commercial implementation. Goldman Sachs predicts that the global market will grow from about $3 billion in 2025 to about $29 billion by 2030, an almost 10-fold increase in five years.
Science and Technology Innovation Board Daily has learned that Alibaba Cloud’s sales team has started expanding its customer base with AI short dramas and comic dramas.
Previously, video generation models such as ByteDance’s Seedance 2.0, Kuaishou’s Keling 3.0, and Aishi Technology’s PixVerse ranked high on global leaderboards, forming a gradual competitive advantage.
Alibaba’s strong entry could become another “catfish” that disrupts the market. The birth of Alibaba’s HappyHorse and its rapid rise to the top not only means the further optimization and maturation of video generation technology, but also means that the competition among domestic video generation models is becoming more intense.
Editor/Rice
