Alibaba announces WAN2.2, an open source suite for AI video creation

Alibaba has announced the release of WAN2.2, an open source suite of large-scale video generation models based on the Experts (MOE) architecture.

Model Features

The WAN2.2 series includes the text-to-video model WAN2.2-T2V-A14B, image-to-video model WAN2.2-I2V-A14B, and hybrid model WAN2.2-TI2V-5B. Each model is designed with an emphasis on improving the quality, efficiency and level of user control when generating movie-style videos from prompts and images.

Both WAN2.2.2-T2V-A14B and WAN2.2-I2V-A14B utilize the MOE architecture and use data curated by the film aesthetic. These models allow creators to adjust multiple video properties such as lighting, time, color tone, camera angle, frame size, configuration, and focal length. According to Alibaba, models can create complex movements that include detailed facial expressions and elaborate sports scenes, but they can follow instructions and physical rules more than before.

To address the computational efficiency of video generation, WAN2.2-T2V-A14B and WAN2.2-I2V-A14B employ twice the design throughout the diffused model removal process, especially for long tokens. One expert will focus on the layout of the scene under high noise, while the other will improve the details under low noise conditions. The model operates with a total of 27 billion parameters, but only 14 billion parameters are active per step. This claims to reduce computational consumption by up to half.

Aesthetic tuning

WAN2.2 introduces a film-inspired prompt system that allows users to shape results based on key categories such as lighting, lighting, composition, and tone. The company says this approach will allow for more accurate interpretation and delivery of users' aesthetic requests across video generation tasks.

Alibaba extends the dataset of WAN2.2 and reports an increase in image data by 65.6% and video data by 83.2% compared to previous versions of WAN2.1. This increased dataset aims to enhance generalization and creative diversity, allowing models to create more complex scenes and showcase larger arts ranges.

Hybrid models and efficiency

The hybrid model, WAN2.2-TI2V-5B, introduces a dense approach built on a 3D mutant autoencoder (VAE) architecture that features a temporal and spatial compression ratio of 4x16x16. This gives you a 64 information compression ratio. Alibaba says the TI2V-5B can generate 5-second 720p video in minutes on a single consumer-grade GPU.

“The TI2V-5B generates 5 seconds of 720p video in minutes on a single consumer-grade GPU, enabling efficiency and scalability for developers and content creators.”

Open Source and Community Engagement

All WAN2.2 models can be downloaded on Modelscope, the open source platform for Face, Github and Alibaba Cloud. Alibaba open-sourcing four WAN 2.1 models in February 2025, and in May 2025, the model collectively achieved over 5.4 million downloads in face and model embrace.

“Alibaba Open, a leading contributor to the global open source community, sourced four WAN 2.1 models in February 2025 and WAN 2.1-VACE (creation and editing of video all-in-ones) in May 2025.

Alibaba's WAN 2.2 release highlights the continued activity within the open source ecosystem and the continued development of video generation models aimed at supporting global creators and developers.

Source link