Nate Silver: Big data's value is not in the data
Without good analysis, data can lead people down wrong roads.
Big data isn't the deep well of critical information it's cracked up to be if the person or systems peering into it don't know what they're looking for, according to one of the world's leading number-crunchers.
Nate Silver, who successfully predicted outcome of the presidential election in 49 of 50 states in 2008 and all 50 in 2012 with his close analysis of statistics and polls, said the vast pool of information their agencies have amassed, or inherited, is only as good as the bucket they use to draw it out.
"Large volumes of data can make people biased," he told attended at the 2013 SAS Government Leadership Summit, held in Washington, D.C. on May 21.
There is a vast and steadily growing amount of data these days. He noted that when people are hit with a lot of data, they tend to polarize their views or stance and don't look beyond the immediate.
The onslaught of data isn't letting up. "Ninety percent of all data has been created in the last two years," he said. He likened the spike in data creation to the paradigm shift that split governments and society after the printing press was invented in the 15th century.
"There are huge gains to be had, but there is a gap between what you know and what you think you know," he said.
Analyzing big data, such as weather forecast information, financial information or other vast reservoirs of information has predictable pitfalls that can be avoided, according to Silver. He warned in particular against seeing things in data that aren't really there. "We're wired to detect patterns," he said, but a pattern may not be exactly what it appears to be at first glance.
He also warned that data can offer seductive, but possibly dangerous illusions if not addressed honestly. "Describing data is important, but it isn't a prediction," he said.
Silver noted that Japanese earthquake researchers had looked at data from the Fukushima area before a nuclear reactor was built there. The researchers used historical data that went back 45 years and found no earthquakes of more than 8.0 on the Richter scale. They didn't look further back to find data that showed the area had, indeed, been hit by a larger quake. They subsequently assumed that a 9.0 quake wasn't possible in the area and constructed the reactor to withstand an 8.5 quake. The 9.0 earthquake that shook Japan and set off a historic tsunami in April 2011 also ruptured reactors at the Fukushima plant.
To avoid such pitfalls in analyzing data, Silver recommended thinking "probabilistically," incorporating potential problems into the analysis.