Alibaba announces WAN2.2.2V video generation AI model

Alibaba Group has announced the latest open source AI model, WAN2.2-S2V (Speech-to-Video). According to the company, the new model allows users to convert portrait photos into expressive “movie-quality” avatars that can be spoken, sung and played.

WAN2.2-S2V is part of the WAN2.2 video generation series, using a single image and audio clip to create fully animated videos with a variety of framing options. These include portraits, busts and full-bodied perspectives. Additionally, this model can generate a wide range of character actions and environmental conditions.

It also supports a variety of audio recordings, including natural dialogue and music performances. The video generation capabilities of the model are not limited to human avatars. The company says WAN2.2-S2V supports a diverse range of numbers, including cartoons, animals and other stylized characters.

Additionally, Alibaba claims that the model's innovative frame processing technology significantly reduces computational overhead. This allows for stable long distance generation. Apart from that, the model is tailored to meet the needs of various content creators and supports formats such as short form clips vertical to traditional horizontal film. Users can also choose output resolutions of 480p and 720p.

AI models can be downloaded by hugging Face and Github. Apart from these platforms, this model can also be found in ModelScope, the Alibaba Cloud Open-Source community.

(Source: Alibaba Press Release)

Source link