above According to the paper, Video Rebirth is too small to compete in the capital-intensive AI video battlefield. The less than two-year-old startup has raised $80 million in funding and has a team of 30 people across its Singapore headquarters and offices in Hong Kong. In a field where state-of-the-art video models cost tens of millions of dollars to train and even more expensive to run, Video Rebirth should be shut out of the competition.
But just before the general launch of its flagship model in May, the company took a spot on the benchmark leaderboard alongside the tech giants. Video Rebirth’s Bach model debuted at No. 6 on Artificial Analysis’ text-to-video leaderboard, trailing models developed by Alibaba, ByteDance, Kuaishou Technology, and xAI. The price per minute of video generated is the lowest among the top 10, making it the top startup model.
“For a team of our size, this was a strong signal that our architectural approach was working,” said Liu Wei, co-founder and CEO of Video Rebirth.
According to Liu, the development of an AI video engine is just the opening act. He aims to build models that can generate hyper-realistic worlds by training AI to create visuals that are not just lifelike, but bound by the laws of physics. This is a high-stakes field pursued by tech giants like Google, Meta, and OpenAI, all of whom are racing to develop so-called global models that have the potential to disrupt industries from self-driving cars and robotics to gaming. Liu said he is building a “truly meaningful” model of the world that can understand the environment and simulate what will happen next, in the same way humans predict outcomes for giant creatures based on common sense and instinct.
“We do video generation to build world models,” Liu says. “Within three years, we will prove that we can simulate the physical world in real time.”
Beer commercial demo generated by Video Rebirth’s Bach model.
To make it a success, Video Rebirth closed a seed round totaling $80 million at an undisclosed valuation in March. Investors in the funding included AMD Ventures, the venture capital arm of billionaire Lisa Su’s American AI chip developer Advanced Micro Devices. ZER01NE, the venture capital arm of billionaire Yuisung Chung’s South Korean automaker Hyundai Motor Group. Hiven is an investment company affiliated with billionaire Lee Jae-hyun’s Korean food and entertainment conglomerate CJ Group. Korean game developer Actoz Soft. Shanghai-based Qiming Venture Partners and Gaw Capital, a Hong Kong-based private equity firm chaired by billionaire Goodwin Gaw.
Video Rebirth says it will raise new funding in July, but did not provide further details.
“Our rationale is based on the belief that video generation is much more than a tool for content creation. It represents one of the clearest and most viable paths towards a global model,” said Fan Wei, senior investment manager at Hyundai Cradle, a program of ZER01NE. “Video Rebirth has shared this very vision since its inception, positioning its technology to unlock important future applications in physical AI.”
Video Rebirth’s Bach targets corporate clients in advertising, entertainment, film production, and gaming. A distinctive feature of this model is the ability to generate up to 45 seconds of multi-shot video based on reference images and text prompts. By comparison, ByteDance’s Seedance 2.0 (a popular model for multi-shot AI video generation) released in February is limited to 15 seconds, but also allows video and audio input. Other Bach features include the ability to create clips of up to 10 seconds from text or images, and bind static characters to reference videos.
Video Rebirth competes in a field that is not only crowded, but also expensive to operate, as producing video requires far more computing power than text. The financial toll of the AI video race became clear with OpenAI’s sudden decision to shut down its Sora platform in March, even though the mobile app had amassed nearly 10 million downloads since its release last September and had secured a $1 billion stock and license deal with Walt Disney (now cancelled). forbes In November it was estimated that OpenAI was consuming approximately $15 million per day The company costs about $1.30 each to churn out millions of 10-second videos in response to user requests.
“OpenAI was held back by the cost of inference (the stage where trained AI is used),” Liu says. He added that the cost for Bach to produce a 10-second clip is “significantly lower than other Frontier models,” but declined to share exact numbers citing competitive sensitivity. The startup was able to reduce inference costs thanks to a proprietary technology that can speed up the video generation process by up to 10 times. Called multi-step sampling loss, it’s a mathematical technique that trains a model to predict and correct errors during the production process, so fewer steps are required to create the final video. In contrast, most traditional models cannot predict failures and take longer to run, Liu said.
Financial efficiency also extends to training costs. Liu claims that Bach required “a portion” of the budget for a comparable Frontier model, but did not elaborate further. The head of Video Rebirth says they were able to do this by training on fewer, higher-quality videos, including licensed movies and music videos, as well as clips shot in-house, most of them at 720p resolution. Bach, on the other hand, is designed to split the tasks of immediate compliance and visual production, unlike other models that rely on a single “brain” to handle both tasks. This division of labor improves computational efficiency, Liu explains.
In response to Liu’s claims, an OpenAI spokesperson said in an email: “As computing demands increase, the Sora research team is refocusing on global simulation research to advance robotics and real-world physical tasks.”
A cinematic clip of a man running away from a monster, generated by Video Rebirth’s Bach model.
Liu said Video Rebirth stands out not only for its cost savings, but also for its ability to generate videos that follow physical laws such as gravity, object collisions, and lighting. This has become a significant bottleneck in the industry where objects in AI-created clips can morph or become spooky. He added that his AI is particularly good at maintaining product consistency, which is a top priority for e-commerce advertisers, and is also good at generating facial expressions and landscape shots for filmmakers. In its Video Rebirth funding announcement in March, Hyven said it looked forward to collaborating with the startup across CJ’s businesses, including CJ ENM, an entertainment unit that produces K-dramas and movies.
Video Rebirth’s advantage lies in its “focus on enterprise-grade control and consistency,” says Hyundai’s Huang. He added that the startup is tackling some of the toughest challenges in video generation, including AI’s ability to understand cause and effect and how things move through space and time.
Alex Zhou, managing partner at Qiming Venture Partners, says Video Rebirth has the potential to “become the standard tool for professional content creation across industries such as film, advertising, gaming, and e-commerce” within the next five years, just as “Adobe has been in the traditional creative software industry.”
Video Rebirth is working on a world model that allows you to create interactive 3D environments on the fly based on text prompts, using technology that produces objects and environments that are not only beautiful but also realistic and physically accurate. Unlike traditional 3D simulations, which require writing lines of code and can only react to what has been programmed in advance, a world model is an AI that understands the physical properties of the real world and simulates what will happen next, even in situations that have never been “seen before.”
World models are still in their infancy, but a growing number of companies are betting the technology can be used to train self-driving cars to deal with unexpected situations, teach robots to act intelligently and speed up the development of video games. In January, Google began rolling out Genie 3, allowing users to generate environments that can be navigated using arrow keys and prompt new events (such as adding new objects). Genie 3 supports interactions of just a few minutes, but its release caused a decline in gaming stocks across the board, including Unity Software, over concerns that the technology would make traditional game engines obsolete. The world model is now being used by Alphabet’s self-driving division, Waymo, to test self-driving cars in a variety of scenarios, from natural disasters to rare events like a broken down truck blocking a road.
Other companies developing world models range from big tech companies like Alibaba, Nvidia, and OpenAI to well-funded startups like Google-backed Runway and World Labs, co-founded by AI pioneer Fei-Fei Li.
Alec Rubel, a Los Angeles-based associate partner at McKinsey & Co., says the World Model is somewhere between hype and a game-changer. “Today’s global models are mostly in early stages of development. They represent an important frontier for AI, but have not yet reached the level of fidelity or cost profile needed for widespread deployment across industries.”
Liu plans to prove that Video Rebirth’s global model is a game-changer, with the startup aiming to launch by the end of 2026. Liu said the model, called Olympus, works similarly to the Genie 3, but can also generate environmental sounds such as impact sounds and footsteps. ZER01NE said in a March announcement that it sees Video Rebirth as a “key partner for the future of mobility” with the potential to use its technology to “train physical AI within a hyper-realistic digital world.” Hyundai Motor is a major self-driving car company and owns American robot maker Boston Dynamics.
“As we scale up our world model, we can simulate increasingly complex physical scenarios in real time,” Liu says. “Once that happens, the world model will no longer be limited to games and embodied AI. We will be able to work on a wide range of industrial applications.”
Liu’s ambition to develop a world model was sparked in early 2024 when OpenAI announced the Sora video model, which AI representatives dubbed the “World Simulator.” At the time, Liu, Tencent’s Distinguished Scientist (a senior title given to elite researchers by the Chinese tech giant) who had led the development of the company’s Hunyuan AI model from the ground up, saw where the industry was heading.
“Even though it was only 2024, the large-scale language model space felt very crowded, with big tech companies already entrenching themselves,” Liu says. “Physical AI, on the other hand, was a completely blank canvas. Sora convinced everyone that the physical world could be simulated, even if it seemed incredibly difficult at the time.”
Mr. Liu was confident that he could make the simulation a reality, and he had the qualifications to back up that belief. Armed with a Ph.D. He holds a bachelor’s degree in computer science and electrical engineering from Columbia University and has been researching machine learning since 2007, drawn to his interest in mathematics. He joined Tencent in 2016 after many years in research positions at IBM and Chinese ride-hailing giant Didi Chuxing, as well as teaching at Rensselaer Polytechnic Institute and Stevens Institute of Technology in the United States.
“Mr. Wei is a rare founder who combines world-class research capabilities with deep industry experience,” said Qiming’s Mr. Zhou. “He has consistently been one of the technology experts I trust most when it comes to AI. Whenever there was a major advance in AI models, I often sought his perspective early on. I know many executives in the technology industry have done the same.”
Recognizing the opportunity in physical AI, Liu quit his high-paying job at Tencent in September 2024 to start Video Rebirth. To found the company, he assembled a team of co-founders including Lu Difu, former head of Tencent AI Lab, Liu Peng, a former quantitative developer at JPMorgan Chase & Co., and Dan Cong, who was a director of the 42X Fund, the investment fund of Abu Dhabi-backed AI company G42.
It took more than 20 years for large-scale language models to reach the mainstream after their initial advances, with an academic paper outlining the blueprint published in 2003, but Liu predicts that the path to mass adoption of world models will be even longer. He expects the next 12 months to focus primarily on technological advances within the lab.
Still, Liu is undaunted by the timeline. “I will devote my absolute and indivisible energy to research and development until we succeed in building a commercially viable world model,” he declares. “That day will definitely come.”
