The Trick That Makes Google's Self-Driving Cars Work
Google is engaging in unprecedented, massive, ongoing data collection to transform intractable problems into solvable chores.
Google's self-driving cars can tour you around the streets of Mountain View, California.
I know this. I rode in one this week. I saw the car's human operator take his hands from the wheel and the computer assume control. "Autodriving," said a woman's voice, and just like that, the car was operating autonomously, changing lanes, obeying traffic lights, monitoring cyclists and pedestrians, making lefts. Even the way the car accelerated out of turns felt right.
It works so well that it is, as The New York Times ' John Markoff put it, " boring ." The implications, however, are breathtaking.
Perfect, or near-perfect, robotic drivers could cut traffic accidents, expand the carrying capacity of the nation's road infrastructure, and free up commuters to stare at their phones, presumably using Google's many services.
But there's a catch.
Today, you could not take a Google car, set it down in Akron or Orlando or Oakland and expect it to perform as well as it does in Silicon Valley.
Here's why: Google has created a virtual track out of Mountain View.
The key to Google's success has been that these cars aren't forced to process an entire scene from scratch. Instead, their teams travel and map each road that the car will travel. And these are not any old maps. They are not even the rich, road-logic-filled maps of consumer-grade Google Maps.
They're probably best thought of as ultra-precise digitizations of the physical world, all the way down to tiny details like the position and height of every single curb. A normal digital map would show a road intersection; these maps would have a precision measured in inches.
But the "map" goes beyond what any of us know as a map. "Really, [our maps] are any geographic information that we can tell the car in advance to make its job easier," explained Andrew Chatham, the Google self-driving car team's mapping lead.
"We tell it how high the traffic signals are off the ground, the exact position of the curbs, so the car knows where not to drive," he said. "We'd also include information that you can't even see like implied speed limits."
Google has created a virtual world out of the streets their engineers have driven. They pre-load the data for the route into the car's memory before it sets off, so that as it drives, the software knows what to expect.
"Rather than having to figure out what the world looks like and what it means from scratch every time we turn on the software, we tell it what the world is expected to look like when it is empty," Chatham continued. "And then the job of the software is to figure out how the world is different from that expectation. This makes the problem a lot simpler."
While it might make the in-car problem simpler, but it vastly increases the amount of work required for the task. A whole virtual infrastructure needs to be built on top of the road network!
Very few companies, maybe only Google, could imagine digitizing all the surface streets of the United States as a key part of the solution of self-driving cars. Could any car company imagine that they have that kind of data collection and synthesis as part of their core competency?
Whereas, Chris Urmson, a former Carnegie Mellon professor who runs Google's self-driving car program, oozed confidence when asked about the question of mapping every single street where a Google car might want to operate. "It's one of those things that Google, as a company, has some experience with our Google Maps product and Street View," Urmson said. "We've gone around and we've collected this data so you can have this wonderful experience of visiting places remotely. And it's a very similar kind of capability to the one we use here."
So far, Google has mapped 2,000 miles of road. The US road network has something like 4 million miles of road .
"It is work," Urmson added, shrugging, "but it is not intimidating work." That's the scale at which Google is thinking about this project.
All this makes sense within the broader context of Google's strategy. Google wants to make the physical world legible to robots, just as it had to make the web legible to robots (or spiders, as they were once known) so that they could find what people wanted in the pre-Google Internet of yore.
In fact, it might be better to stop calling what Google is doing mapping , and come up with a different verb to suggest the radical break they've made with previous ideas of maps. I'd say they're crawling the world, meaning they're making it legible and useful to computers.
Self-driving cars sit perfectly in-between Project Tango —a new effort to "give mobile devices a human-scale understanding of space and motion"—and Google's recent acquisition spree of robotics companies . Tango is about making the "human-scale" world understandable to robots and the robotics companies are about creating the means for taking action in that world.
The more you think about it, the more the goddamn Googleyness of the thing stands out: the ambition, the scale, and the type of solution they've come up with to this very hard problem. What was a nearly intractable "machine vision" problem, one that would require close to human-level comprehension of streets, has become a much, much easier machine vision problem thanks to a massive, unprecedented, unthinkable amount of data collection.
Last fall, Anthony Levandowski, another Googler who works on self-driving cars, went to Nissan for a presentation that immediately devolved into a Q&A with the car company's Silicon Valley team . The Nissan people kept hectoring Levandowski about vehicle-to-vehicle communication, which the company's engineers (and many in the automotive industry) seemed to see as a significant part of the self-driving car solution.
He parried all of their queries with a speed and confidence just short of condescension. "Can we see more if we can use another vehicle's sensors to see ahead?" Levandowski rephrased one person's question. "We want to make sure that what we need to drive is present in everyone's vehicle and sharing information between them could happen, but it's not a priority."
What the car company's people couldn't or didn't want to understand was that Google does believe in vehicle-to-vehicle communication, but serially over time, not simultaneously in real-time.
After all, every vehicle's data is being incorporated into the maps. That information "helps them cheat, effectively," Levandowski said. With the map data—or as we might call it, experience— all the cars need is their precise position on a super accurate map, and they can save all that parsing and computation (and vehicle to vehicle communication).
There's a fascinating parallel between what Google's self-driving cars are doing and what the Andreesen Horowitz-backed startup Anki is doing with its toy car racing game . When you buy Anki Drive, they sell you a track on which the cars race, which has positioning data embedded. The track is the physical manifestation of a virtual racing map.
Last year, Anki CEO (and like Urmson, a Carnegie Mellon robotics guy) Boris Sofman told me knowing the racing environment in advance allows them to more easily sync the state of the virtual world in which their software is running with the physical world in which the cars are driving.
"We are able to turn the physical world into a virtual world," Sofman said. "We can take all these physical characters and abstract away everything physical about them and treat them as if they were virtual characters in a video game on the phone."
Of course, when there are bicyclists and bad drivers involved, navigating the hybrid virtual-physical world of Mountain View is not easy: the cars still have to "race" around the track, plotting trajectories and avoiding accidents.
The Google cars are not dumb machines. They have their own set of sensors: radar, a laser spinning atop the Lexus SUV, and a suite of cameras. And they have some processing on board to figure out what routes to take and avoid collisions.
This is a hard problem, but Google is doing the computation with what Levandowski described at Nissan as a "desktop" level system. (The big computation and data processing are done by the teams back at Google's server farms.)
What that on-board computer does first is integrate the sensor data. It takes the data from the laser and the cameras and integrates them into a view of the world, which it then uses to orient itself (with the rough guidance of GPS) in virtual Mountain View. "We can align what we're seeing to what's stored on the map. That allows us to very accurately—within a few centimeters—position ourselves on the map," said Dmitri Dolgov, the self-driving car team's software lead. "Once we know where we are, all that wonderful information encoded in our maps about the geometry and semantics of the roads becomes available to the car."
Once they know where they are in space, the cars can do the work of watching for and modeling the behavior of dynamic objects like other cars, bicycles, and pedestrians.
Here, we see another Google approach. Dolgov's team uses machine learning algorithms to create models of other people on the road. Every single mile of driving is logged, and that data fed into computers that classify how different types of objects act in all these different situations. While some driver behavior could be hardcoded in ("When the lights turn green, cars go"), they don't exclusively program that logic, but learn it from actual driver behavior.
In the way that we know that a car pulling up behind a stopped garbage truck is probably going to change lanes to get around it, having been built with 700,000 miles of driving data has helped the Google algorithm to understand that the car is likely to do such a thing.
Most driving situations are not hard to comprehend, but what about the tough ones or the unexpected ones? In Google's current process, a human driver would take control, and (so far) safely guide the car. But fascinatingly, in the circumstances when a human driver has to take over, what the Google car would have done is also recorded, so that engineers can test what would have happened in extreme circumstances without endangering the public.
So, each Google car is carrying around both the literal products of previous drives—the imagery and data captured from crawling the physical world—as well as the computed outputs of those drives, which are the models for how other drivers might behave.
There is, at least in an analogical sense, a connection between how the Google cars work and how our own brains do . We think about the way we see as accepting sensory input and acting accordingly. Really, our brains are making predictions all the time, which guide our perception. The actual sensory input—the light falling on retinal cells—is secondary to the prior experience that we've built into our brains through years of experience being in the world.
That Google's self-driving cars are using these principles is not surprising. That they are having so much success doing so is.
Peter Norvig, the head of AI at Google, and two of his colleagues coined the phrase " the unreasonable effectiveness of data " in an essay to describe the effect of huge amounts of data on very difficult artificial intelligence problems. And that is exactly what we're seeing here. A kind of Googley mantra concludes the Norvig essay: "Now go out and gather some data, and see what it can do."
Even if it means continuously and neverendingly driving 4 million miles of roads with the most sophisticated cars on Earth and then hand-massaging that data—they'll do it.
That's the unreasonable effectiveness of Google.