Google DeepMind is releasing a new version of the AI “World” model called Genie 3, which allows users to create a 3D environment where users and AI agents can interact in real time. The company also promises that users can interact with the world for much longer than before, and that they actually remember where things are when models actually look away.
World Models are a type of AI system that can help train robots and AI agents by simulating environments for educational, entertainment and other purposes. With the World Model, it gives you a prompt and generates space that you can move around like a video game, but not handmade with 3D assets, it's all generated with AI. This is an area where Google puts a lot of effort into. The company unveiled the Genie 2 in December. It builds a world model team led by previous co-leads of Openai's Sora video generation tool, which can create an interactive world based on images.
However, the model currently has many drawbacks. The world of Genie 2 could only play for up to one minute, for example. I recently tried “interactive videos” from a company supported by the co-founders of Pixar and felt like I was walking through a blurry version of Google Street view.
It appears that Genie 3 could be a significant step forward. According to a blog post, users can generate the world with prompts that support “minimums” of continuous interactions. Google says the Genie 3 can maintain space in visual memory for about a minute. So, when you leave something in the world and then come back, you say you're in the same place where you paint on the wall, write on the blackboard, etc. It also has a resolution of 720p worldwide and runs at 24fps.
DeepMind is adding what is called a “Speed World Event” to Genie 3. You can use the prompts to do things like changing the weather conditions around the world, adding new characters, and more.
However, this is probably not a model you can try out yourself. According to Google, it has launched as a “limited research preview” that can be used by “small cohorts of scholars and creators” to help developers better understand risks and how to properly mitigate them. There are also many limitations, such as the limited way in which users can interact with the generated world, and the easy-to-read text that is often generated only when provided in an input world description. Google says it is “exploring” how to lead Genie 3 to “additional testers.”
