Microsoft has introduced a new artificial intelligence (AI) model that can generate hyper-realistic videos of human faces talking. His AI image-to-video model, called VASA-1, can generate videos from just his single photo and voice audio clip. According to the company, the videos created will have lip movements synchronized with the audio, making facial expressions and head movements look natural. In particular, the tech giant claims that it does not intend to release any products or APIs using his VASA-1 model, and that it will be used to create realistic virtual characters.
In a post on its research announcement page, Microsoft detailed how the AI model it is developing and highlighted its capabilities. The company claims that the VASA-1 model can produce 512 x 512p resolution videos at up to 40 FPS. This AI model is also said to support online video generation with negligible start-up delay. X (formerly Twitter) user Kaioken shared Video of AI model in action.
VASA-1's biggest accomplishment is its ability to render up to a minute of video (according to the demo) in high quality using a single still image, but the company also says it can generate lip movements that match audio files. We also emphasized the ability to The expression that accompanies it. The AI video generation model also provides fine-grained control to control various aspects of the video, such as primary gaze direction, head distance, and emotional offset. These attribute controls for disentangled appearance, 3D head pose, and facial dynamics help you modify the output precisely according to your instructions.
Additionally, the AI model was also able to generate videos using artistic photos, singing voices, and non-English voices. Microsoft researchers note that these functional capabilities are absent from the company's data, suggesting its self-learning ability.
While it is impressive that an AI model can generate hyper-realistic videos of real people with arbitrary audio, it also raises questions about its unethical use, especially in creating deepfakes. The company stressed that it does not intend to release the AI model to the public, but rather to use it to create virtual interactive characters.
Microsoft also said the technology can be used to improve counterfeit detection. “While recognizing the potential for abuse, it is essential to recognize that our technology has significant positive potential. The benefits range from providing companionship and therapeutic support to those in need, highlighting the importance of our research and other related explorations as we advance human well-being. With this goal, we are dedicated to developing AI responsibly,” the company added.
