
On Sunday, Runway unveiled a new AI video synthesis model called Gen-3 Alpha, which is still in development but appears capable of producing videos of the same quality as OpenAI's Sora, which debuted earlier this year (and has yet to be released.) The model can generate innovative, high-definition videos from a variety of text prompts, ranging from lifelike humans to surreal monsters stomping through the countryside.
While Runway's previous best model, announced in June 2023, could only create two-second clips, Gen-3 Alpha can reportedly create 10-second video segments of people, places, and things that are comfortably more consistent and coherent than Gen-2. If 10 seconds seems short compared to Sora's one minute of video, consider that the company is working with a tiny compute budget compared to the better-funded OpenAI, and has a track record of actually shipping video generation capabilities to commercial users.
Gen-3 Alpha doesn't generate audio to accompany the video clips, and temporally consistent generation (keeping characters consistent over time) is most likely dependent on similar high-quality training material, but the improvements in Runway's visual fidelity over the past year are hard to ignore.
AI Video Gets Hot
The AI research community has been busy with AI video synthesis in recent weeks, including the announcement of Kling, a Chinese model developed by Beijing-based Kuaishou Technology (sometimes referred to as “Kwai”) that claims to be able to generate two minutes of 1080p HD video at 30 frames per second with a level of detail and consistency that rivals Sora.
Gen-3 Alpha Prompt: “A woman's subtle reflection in the window of a train traveling at lightning speed through a Japanese city.”
Shortly after Kling debuted, people on social media began using Luma AI's Luma Dream Machine to create surreal AI videos. These videos were unusual and strange, but overall inconsistent. We gave the Dream Machine a try, and we weren't impressed with anything we saw.
Meanwhile, one of the pioneers of text-to-video conversion, New York-based Runway, founded in 2018, was recently the target of memes showing its Gen-2 technology being poorly received compared to newer video composition models, which may have spurred the announcement of Gen-3 Alpha.
Gen-3 Alpha prompt: “Astronauts running through the streets of Rio de Janeiro.”
Generating realistic humans has always been a challenge for video synthesis models, which is why Runway is particularly showcasing Gen-3 Alpha's ability to create what the developers call “expressive” human characters with a wide range of movements, gestures, and emotions. However, the samples the company provided aren't particularly expressive – most just stare and blink slowly – but they do look realistic.
Examples of humans provided include generated videos of a woman riding a train, an astronaut running down a street, a man with a TV light shining on his face, a woman driving a car, and a woman running.
Gen-3 Alpha prompt: “A close-up shot of a young woman driving a car with a pensive look on her face, with a blurry green forest visible through the rain-soaked car window.”
The generated demo videos also include some more surreal video compositing examples, such as a giant creature walking through a ruined city, a man made of rocks walking through a forest, and even a giant cotton candy monster, shown below, which is probably the best video on this entire page.
Gen-3 Alpha Prompt: “A giant humanoid made of fluffy blue cotton candy stomps along the ground and roars into the sky. There's a clear blue sky behind him.”
Gen-3 enhances a range of Runway AI editing tools (one of the company's best known features), including multi-motion brushes, advanced camera controls and Director Mode, which lets you create videos from text or image prompts.
Runway says Gen-3 Alpha is the first in a series of models trained on its new infrastructure designed for large-scale multi-modal training, marking a step toward developing “general-world models,” virtual AI systems that build an internal representation of an environment and use it to simulate future events within that environment.
