Picsart has developed a new text-to-video generative AI model

Picsart’s Artificial Intelligence Research Team (PAIR) has built a new generative model that can create entirely new video content from text descriptions alone.

The technology, often referred to as text-to-video generative artificial intelligence (AI), was released as an open-source demo on Twitter and published on GitHub and Hugging Face. The team behind it has also published a research paper describing the methodology.

“Recent text-to-video generation approaches rely on computationally intensive training and require large video datasets. We introduce existing text-to-image synthesis methods (e.g. stable diffusion), making them suitable for the video domain,” the researchers explain.

Text2Video-Zero: text-to-image diffusion model is a zero-shot video generator

Abs: https://t.co/5xCsj4PNRj
github: https://t.co/BdSzlepGQG pic.twitter.com/XY4piH6j4v

—AK (@_akhaliq) March 24, 2023

The main problem with text-to-video generative AI today is that the general idea of what is being produced is consistent, but its presentation is inconsistent. The main subject often looks slightly different from frame to frame, and the background is also inconsistent, so the finished video looks like everything is always in motion and therefore lacks realism. The team tried to counter this.

The researchers explain that key changes compared to other attempts at text-to-video generation include “enhancing the latent code of the generated frames with motion dynamics.” increase. This keeps the global scene and background time consistent. It also does a better job of preserving the context, appearance and identity of the foreground subject compared to many other generative video systems.

Picsart Generative Video AI — “A cute cat running in a beautiful meadow”

“Experiments have shown that this enables low overhead, high quality and surprisingly consistent video generation. Furthermore, our approach is not limited to text-to-video synthesis. It can also be applied to other tasks such as conditionally content-specific video generation,” said the researchers.

“As experiments show, our method performs as well as, and sometimes better than, recent approaches, even though it was not trained on additional video data.”

The new generative AI can be used not only to create videos from text descriptions, but also to modify the appearance of existing videos. For example, changing the video of a swan by asking the AI to ‘bang me’. Van Gogh Starry Night Wind. ”

Unlike most research projects, which can take months or years to be rolled out to the public, it won’t be long before the PAIR text-to-video generative AI system is ready for customers. Picsart says it plans to launch new software products built on this generative AI framework in the coming weeks.

Picsart isn’t the only company making progress in text-to-video AI. Google is developing one of his, Meta started work on his one last fall, and last week Runway unveiled its second generation text-to-video generator. This was published first.

Image credit: pair

Source link