Genie 3 + Maps: AI Turns Street View into a Playable World
Genie 3 turns 280 billion Street View images into on-demand playable worlds. How it works, its limits, and its use at Waymo.
by Cleverson

Genie 3 has just received the integration that changes what we understand by map: Google DeepMind's world model now consumes Google Maps' Street View archive to generate playable simulations anchored in real addresses. Announced on 19 May 2026, the feature transforms 280 billion images into real-time navigable environments — and raises concrete questions about the future of AI training, generative games, and even robotaxis.
TL;DR
- Genie 3 is Google DeepMind's world model: it generates interactive video at 720p 24fps from text.
- Google connected Street View to Genie via Maps Imagery Grounding: you pin a location on the map and the model creates a playable scene of that place.
- Available to Google AI Ultra subscribers (US$200/month) aged 18+; the geographic pin only works in US locations for now.
- Waymo already uses Genie 3 to train robotaxis in rare scenarios — from blizzards to elephants on the road.
- There are still limits: the model doesn't understand physics, 'forgets' after about 1 minute, and the result looks like a game, not a photo.
What is Genie 3 and why it matters
Genie 3 is what DeepMind calls a world model — an AI that doesn't return a sentence or a static image, but an interactive environment, controllable frame by frame. You describe a scene (text) or point to a map location (image) and the model generates the next frames in real time, reacting to your inputs like a video game.
Originally announced in August 2025 as a research preview, Genie 3 has only now reached the general public in May 2026, with a strategic choice: instead of competing head-to-head with Sora or Veo in linear video generation, Google positioned Genie as a simulator. The practical difference? AI-generated video is cinema — you watch. A world model is a theme park — you walk inside.
For developers, this opens up an unprecedented product class: infinite, cheap, context-aware environments from a prompt. I've been following world models since Genie version 1, and the quality leap here is disproportionate. Version 2 lost coherence in 10 seconds. Version 3 maintains topology for minutes.
How Genie 3 uses Street View: step by step
The public flow is simple and runs inside Google Labs, at URL labs.google/projectgenie/. The steps:
- You open the experiment in Labs with an active Ultra account.
- Drop a pin on a US address within the embedded Google Maps.
- Choose an optional style (
Desert Sands,Stone Age,Ocean World,B&W filmetc.). - Describe a character — it can be a comic hero, animal, or claymation figure.
- Genie 3 loads the panoramic Street View image of that point, aligns the topology, and generates the first frame of the simulation. From there, you walk, look around, change the weather.
Internally, the key is the technology Google has named Maps Imagery Grounding. Instead of 'imagining' the geography from scratch, the model receives the real image as a spatial seed. Street View acts as an anchor — the rest comes from generative.
Technical specifications: 720p, 24fps and the one-minute wall
Looking at the raw numbers released by DeepMind:
| Feature | Genie 2 | Genie 3 |
|---|---|---|
| Resolution | 360p | 720p |
| Frame rate | ~12fps | 24fps |
| Visual memory | ~10s | ~60s |
| Generation | Auto-regressive per frame | Auto-regressive per frame |
| 3D representation | Implicit | Implicit (no NeRF/Gaussian Splatting) |
| Interaction | Limited | Real-time |
The choice not to use NeRF or Gaussian Splatting is deliberate. These methods require explicit 3D reconstruction — expensive, slow, dependent on prior scanning. Genie 3 generates everything 'frame by frame based on the world description and actions', as the official paper describes. This trades perfect geometric consistency for radical flexibility: any prompt becomes a world.
The one-minute wall is the most annoying limitation. After about 60 seconds of walking, the model starts to forget what was behind you. If you turn 360°, there's a good chance the landscape has changed. For short games and demos, it works. For a two-hour RPG session, not yet.
Maps Imagery Grounding: the heart of the integration
The big technical feat is not rendering the starting point well — Street View has been doing that for 18 years. It's maintaining spatial coherence as you move away. Jonathan Herbert, director of Google Maps, was straightforward in explaining this to TechCrunch: the advance is not 'faithful reconstruction', but 'spatial continuity'. Genie 3 remembers the neighbourhood in 360° and builds the next streets from that base.
The archive is colossal:
- 280 billion images captured
- 110 countries covered
- 7 continents mapped
- Nearly 20 years of cumulative collection
For Genie, this archive is a training dataset that no competitor can replicate — not Meta, not OpenAI, not xAI. This is the first time, in practice, that 'Google having Street View' has become a direct advantage in generative AI, not just local search.
Why this changes the game for Waymo
The Genie 3 + Street View integration already has an internal customer consuming it: Waymo, Alphabet's robotaxi division. Waymo's team uses Genie 3 to generate rare scenarios that would be costly or impossible to film on the street:
- Tornadoes in urban areas
- Large animals crossing the road (elephants, in an example cited by TechCrunch)
- Snowstorms in cities where it never snows
- Erratic pedestrian behaviour in atypical conditions
The logic is simple: an autonomous driving system only becomes safe if tested on edge cases. And edge cases, by definition, are rare — collecting real data takes decades. Training in an AI-simulated world accelerates that to weeks. With Street View in the loop, Waymo takes a specific corner in Phoenix or San Francisco and runs a thousand variants of 'what if it hailed now?' on that real geometry.
This also has strong implications for the Voyia team, the school management platform we maintain here at Agathas: the same principle — training agents in a simulated environment before releasing them into production — applies to any AI that needs to handle rare scenarios. You can read how we think about technology infrastructure applied to Moodle in the custom app — the standard of 'validate in a controlled environment first' is the same.
Practical applications: games, training and education
Beyond the hype, the concrete applications that will hit the market first are:
- Generative games: on-demand scenarios instead of manual level design
- Agent training: AI learning policy in new worlds every episode
- Immersive education: historical walkthroughs (walking through Ancient Rome from the current map)
- Tourist preview: destination visualisation in the city's style
- Simulated robotics: arms and drones training in a variety of scenarios
- Cinema and VFX: pre-visualisation of scenes generated from real locations
The most immediate case is for indie game studios. Today, creating an open world requires a team of level designers. With Genie 3, a solo designer prototypes in a day. It doesn't replace AAA production, but it removes the entry barrier for experimentation.
For those working with paid traffic and product, there's another window: interactive ads. Imagine an ad where the user 'walks' through your store's virtual window instead of seeing a static carousel. Not for me, but the path is open — worth following closely, especially if you handle local offers as we discussed in the post about business WhatsApp and Official API.
Current limits: physics, hallucinations and photorealism
Let's be realistic. The limitations that DeepMind itself admits:
- Non-existent physics: in one demo, a woman running in Joshua Tree passes straight through cacti and bushes. No collision.
- Broken text: signs, billboards and anything written renders as scribbles.
- Geographic hallucination: the corner is recognisable, but surrounding details shift as you move.
- Limited multi-agent: two characters controlled simultaneously still don't work well.
- Photorealism: the result looks like a game, not a film. Jack Parker-Holder, a DeepMind researcher, estimates the gap to video quality (Veo, Sora) is 'six to 12 months'.
For cases like robotaxi simulation, the lack of physics is serious. Training a car to respect pedestrians in a world where pedestrians walk through walls can introduce dangerous biases. Waymo, of course, uses Genie in combination with other physics simulators — not as the sole source of truth.
How to access Genie 3 with Street View
The full package requires three conditions:
- Active Google AI Ultra subscription (US$200/month as of this post)
- 18 years or older (verification via Google account)
- Access to Google Labs at
labs.google/projectgenie/
The global text-based generation feature is already available. The map pin works only in the US — Google has signalled expansion, with no confirmed date for Brazil. Those outside the US can use Genie 3 without Street View anchoring, generating purely prompt-driven worlds.
Important: Labs is an experimental showcase. Public APIs to integrate Genie 3 into your own products don't exist yet. Anyone wanting to build SaaS on top of this needs to wait — likely via Vertex AI in the coming months.
Genie 3 vs. competitors: Veo, Sora and GameNGen
| Model | Type | Interactive? | Resolution | Stable duration |
|---|---|---|---|---|
| Genie 3 (DeepMind) | World model | Yes, real-time | 720p @ 24fps | ~1 min |
| Veo 3 (Google) | Video generation | No | 1080p | Up to 60s linear |
| Sora (OpenAI) | Video generation | No | 1080p | Up to 20s linear |
| GameNGen (Google) | Game simulation | Yes (Doom only) | 720p | Indefinite (closed game) |
Genie 3 is the only one on the list that combines three things: real-time interactivity, open world, and real data via Street View. Veo and Sora generate prettier clips, but don't respond to input. GameNGen interacts, but only within a specifically trained game.
It's common to confuse Genie with Veo. The rule I use: if you're going to watch, it's Veo (or Sora). If you're going to walk inside, it's Genie.
What to expect in the next 12 months
Looking at the implicit roadmap in official announcements:
- Geographic expansion of the Street View pin outside the US — Brazil should enter given its already mapped Street View.
- Physics improvement — collision and gravity are declared priorities.
- API/Vertex AI — opening for developers to build products.
- Extended memory — moving beyond the 1-minute wall to hours.
- Multi-agent — multiple characters controlled simultaneously.
If I were to bet where this breaks first at scale, I'd say automotive simulation. Not just for Waymo — every manufacturer selling ADAS systems will want infinite simulation. Volkswagen, BYD, Stellantis. The cost of real-track training is so high that any 10x gain in iteration pays for the licensing.
Second, generative games. Not to replace AAA, but for a new niche of short experiences like 'playable TikTok' — walking through your favourite character's neighbourhood in claymation style, for example.
Conclusion: from static map to dynamic world
For nearly two decades, Street View was a reference product — you consulted it to see what a place was like. With Genie 3, it becomes simulation raw material: a database that feeds on-demand playable worlds.
For developers in Brazil, the direct impact is still small (no local pin, no public API), but it's worth following. The combination of Street View + world model is the kind of competitive advantage that only a company with 18 years of collecting panoramic images of the entire world can offer. When this becomes an API, it will redefine how we build any product that involves space — from logistics to augmented reality.
For Ultra subscribers in the US: it's worth testing. For the rest of us, it's time to study.
Related posts

ComfyUI on Google Colab: Generate Images and Videos with AI
How to run ComfyUI on Google Colab to generate images and videos with AI — no local GPU, with Wan 2.2, Flux and SDXL.

Azure Linux 4: Microsoft Launches Fedora-Based Distro
Microsoft has swapped CBL-Mariner for Fedora as the upstream for Azure Linux 4 and expanded its focus to Azure VMs. What this means in practice.

GitHub Breached: VS Code Extension Leaked 3,800 Repositories
The GitHub breach in May 2026 shows how a single poisoned VS Code extension can bring down fortresses. See the attack and how to protect your team.