Data scientists: Stop trying to just predict; look back, and think
In the rush to predict the future, many are passing up the opportunity to leverage big data's potential to form a more complete picture of how the world works, leading scientists say.
Sometimes you miss the forest for the trees, and sometimes you get so caught up in plotting a path through the forest, you never stop to figure out how the forest got there in the first place.
Taking a look backward instead of forward was a major theme at the 2nd annual Federal Big Data Summit sponsored by the Advanced Technology Academic Research Center, a non-profit forum for government, industry and academia to collaborate.
“You absolutely have to look back to make sense of your data,” said Nagesh Rao, chief technologist at the Small Business Administration. Rather than just blindly forging ahead with predictive analytics, looking back provides the opportunity to digest information and figure out why things happen, Rao said.
That deeper understanding – drawing “causal inferences” – is sorely needed.
Right now, humanity has a “component-level understanding of complicated systems, not a system-level understanding,” said DARPA program manager Paul Cohen. That, he said, “is a dangerous place to be in,” as humans feel confident enough to interfere with individual systems yet cannot measure the impact those interferences have on the entire global ecosystem.
Cohen used the example of dumping iron filings into the ocean to promote algae blooms. The algae consume carbon dioxide, die and sink to the bottom of the ocean, effectively trapping the greenhouse gas underwater. While it may seem like a perfect climate change-fighting tactic, Cohen warned that we really don’t understand the broader implications of widespread iron dumping.
Jennifer Bachner, government certificates director of Johns Hopkins University, echoed Rao and Cohen, calling for data studies to get beyond “prediction” and into “causal studies,” which she said will require the cooperation of data scientists, social scientists and others across a wide variety of disciplines.
For Cohen, machines will provide the mulling-over power that humans lack.
“The vast majority of scholarly work we do never gets synthesized into a model of how the world works,” Cohen noted, lamenting the thousands of pages of research that is published and then lies unread each year. “The goal three to five years from now is that machines will read everything that everyone writes.”
Cohen is working toward that goal with DARPA’s Big Mechanism cancer research-processing program.
Humanity will still play the crucial role of inputting the thinking. “[IBM’s] Watson itself doesn’t have a causal understanding of anything,” he said, noting the system can provide an answer to a question only if someone somewhere has already written and uploaded said answer.
The assembled speakers did, of course, tout the predictive power of big data analytics, with Bachner looking forward to predictive policing – “kind of like ‘Minority Report,’ but in the real world” – and Rao noting how big data can help identify emerging trends and opportunity for investment in critical technologies of the future.
The FCC’s Tony Summerlin cautioned that it’s a “waste of time” when people “use data to prove things that are well-known to everyone.”
But for those answers that are not well-known, and for crafting a bigger-picture view of how and why the world works, big data holds immense promise – if humans are willing to take a look back and work with machines, said Cohen.
“The highest quality knowledge is about small numbers of things,” he added, saying all those small bits of information need to be brought together in a more cohesive whole. The end goal: developing “causal knowledge,” not just statistical predictions.