Genie 3 World Model Produces Minutes of Video in Real Time

By Paula Parisi
August 15, 2025

Google DeepMind has unveiled Genie 3, a world-building model that uses text and image prompts to generate 3D environments in real time. Still in research preview, Genie 3 can output “several minutes” of video that can be navigated in real time at 24fps and a resolution of 720p. Because it remembers the rules of the world it creates, Genie 3 allows agents to predict how the environment evolves and how actions affect it. Google says world models are “a key steppingstone” to artificial general intelligence, or AGI, since they can train AI agents in “an unlimited curriculum of rich simulation.”

Google DeepMind Research Director Shlomi Fruchter calls Genie 3 “the first real-time interactive general-purpose world model,” explaining in TechCrunch that it can generate both realistic and imaginary worlds “and everything in between.”

While it lacks the fidelity of Nvidia’s photoreal Omniverse world-building tool, which simulates Newtonian physics and is used in engineering workflows and digital twins as well as film production, it also lacks Omniverse’s steep learning curve. Genie 3’s resolution and capabilities make it appropriate for use cases like game visualization, AI agent training and basic robotic simulation and prototyping.

“DeepMind says the model teaches itself how the world works — how objects move, fall, and interact — by remembering what it has generated and reasoning over long time horizons,” writes TechCrunch.

“With Genie 3, all it takes is a prompt or image to create an interactive world,” emphasizes Ars Technica, adding that “since the environment is continuously generated, it can be changed on the fly. You can add or change objects, alter weather conditions, or insert new characters.”

In an explainer video, DeepMind refers to these as “promptable events.” A blog post by Google DeepMind explains that worlds generated by Genie 3 are “dynamic and rich because they’re created frame by frame based on the world description and actions by the user.”

“This provides an opportunity to refine how AI models — including so-called ‘embodied agents’— behave when they encounter real-world situations,” according to Ars Technica. Embodied agents are AI systems that have a body (real or simulated) and interact with their environment.

Genie 3 draws from its predecessor Genie 2, which generated video game-like worlds of up to a minute, and Google’s Veo 3 video generation model and its machine learning approach to physics, relying on vast amounts of training data to plausibly simulate real world behaviors by copying patterns.

Genie 3 World Model Produces Minutes of Video in Real Time

No Comments Yet

Leave a comment