Text-to-video generative AI is finally here

AI Video & Visuals


I like my AI I like foreign cheese varieties, the ones that are incredibly weird and full of holes, the kind where most definitions of “good” are left to personal taste. As I explored the next frontier of public AI models, I was amazed. my neighbor seinfeld suspend nothing, forever First released.

Runway is one of two startups. Thanks to the AI ​​art generator Stable Diffusionannounced its first public test on Monday Gen-2 AI Video Model It was live soon. The company made a startling claim that this was “the first text-to-video model ever published.” Unfortunately, a more obscure group of early text-to-video models might have beaten the runway.

Google and meta It’s already working on its own text-to-image generator, but neither company has been very proactive about the news since they were first teased. The small team is known for its online video editing tools, including video-to-video conversion. Gen-1 AI model Create and convert existing videos based on text prompts or reference images. Gen-1 can transform a simple rendering of a swimming stickman into a scuba-his diver, or use generated overlays to turn a man walking down the street into a claymation nightmare. increase. Gen-2 is supposed to be the next big step up, allowing users to create he 3-second videos from scratch based on simple text prompts. The company hasn’t allowed anyone to get their hands on it yet, but the company has shared several clips based on prompts such as “eye close-up” and “aerial view of mountain scenery.” .

Few people outside the company did it Experience the new model of Runway. However, if you still crave AI video generation, there is another option. AI A text-to-video system called ModelScope The DAMO Vision Intelligence Lab, the research arm of e-commerce giant Alibaba, has already made some headlines with its occasionally awkward and often insane two-second video clip released last weekend. I created it as a kind of public test case. The company says the system uses a fairly basic diffusion model to create the video. page Let’s talk about that AI model.

ModelScope is open source and already available. hugging faceHowever, it can be difficult to get the system running without paying a small fee to run it on another GPU server.tech youtuber Matt Wolf There are good tutorials on how to set it up. Of course, if you have the technical skills and the VRAM to support it, you can run the code yourself.

ModelScope is pretty blatant in terms of where its data comes from. Many of these generated videos contain vague outlines of the Shutterstock logo. This means that your training data may contain a significant portion of videos and images from stock photo sites. This is a similar problem with other AI image generators such as Stable Diffusion. Getty Images sues Stability AIis a company that has published an AI art generator, noting the number of Stable Diffusion images that create corrupted versions of Getty watermarks.

Of course, that hasn’t stopped some users from making tiny movies with such cumbersome AI. Pudgy-faced Darth Vader visiting a supermarket Or spiderman and capybara unite to save the world.

As far as the runway goes, the group is poised to make a name for themselves in the ever-more-crowded world of AI research.their paper Describe that Gen-1 According to Runway researchers, their model is trained on both images and videos in a “large dataset,” which includes text image data and videos without captions. These researchers found that they simply lacked videotext datasets of the same quality as other image datasets using images scraped from the internet. This requires the company to get the data from the video itself. It will be interesting to see how a more sophisticated text-to-video version of Runway stacks up, especially compared to heavy hitters like Google showing off longer-form narrative videos. hand.

If Runway’s new Gen-2 waiting list is like the Gen-1 waiting list, users will have to wait weeks to get their hands on the full system. In the meantime, for those looking for a more exotic take on AI, playing around with ModelScope might be a good first option.Of course this is same conversation About the AI-generated images, about the AI-generated videos that we’re doing right now.

The following slides are part of an attempt to compare Runway and ModelScope and test the limits of text-to-image conversion. I converted the images to his GIF format using the same parameters for each. The frame rate of the GIF is close to the original AI-created video.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *