Does the big data era demand new rules of the road?
The gray area for those developing guidelines lies in the purpose behind a data collection, and the issue of subsequent, unforeseen use.
As policymakers wrestle with the emerging science of big data, the White House is expected to issue a report in the coming weeks – spearheaded by senior advisor John Podesta – that will detail how government and the private sector can take advantage of opportunities while being mindful of privacy risks.
One tricky element of any big data discussion is moving past the buzzwords and 30,000-foot vantage points to get some precision on what is meant by the term. Maureen K. Ohlhausen, a commissioner at the Federal Trade Commission, offered the "three V's" approach in an April 22 speech at Georgetown Law School, defining big data as including volume, containing variety among structured and unstructured datasets from different sources, and the capability of being produced and analyzed at high velocity. (Others have added a fourth and even fifth V to the definition.)
As a regulator, Ohlhausen approaches the big data issue from a few different perspectives. When it comes to data privacy, the FTC's jurisdiction is well established. A federal court recently affirmed in the case of the Wyndham Hotels data breach that the FTC has the authority to bring cases against companies for inadequately protecting customer data. This has big implications for big data, Ohlhausen noted.
"The FTC's data security enforcement framework isn't perfect," she said. "I would like to develop more concrete guidance to industry, for example. But I haven't seen anything that suggests that big data technology raises fundamentally new data security issues," she said.
The real gray area in regulating big data lies in the purpose behind a data collection, and the issue of subsequent, unforeseen use. According to the data-collection best practices framework known by the acronym FIPPs (Fair Information Practice Principles), data should be collected for stated purposes, with the consent of the consumer, and with a minimum of retention.
The problem of consent was highlighted by former Census Director Bob Groves, currently provost of Georgetown University, at a panel discussion following Ohlhausen's speech. Big data sets are often generated out of data ecosystems like sensors, and have not been designed by statisticians or researchers. "They're often just a single observation with a time stamp and a location stamp," he said. Such datasets are often held by proprietary organizations that lack clear rules on sharing such data with researchers.
As Ohlhausen pointed out, there are obvious tensions between the FIPPs framework and the way big data is used in the real world, where researchers, firms and governments are looking to combine and reuse information in ways not contemplated when consumer consent was given. "Companies cannot give notice at the time of collection for unanticipated uses," she said.
"Strictly limiting the collection of data to the particular task at hand and disposing of it afterward would handicap the data scientist's ability to find new information to address future tasks," Ohlhausen said. "Certain de-identification techniques such as anonymization, although not perfect, can help mitigate some of the risks of comprehensive data retention while permitting innovative big data analysis to proceed."
From a policy point of view, she said, the Fair Credit Reporting Act might provide some useful guidance. The 1970 law puts some restrictions on how and with whom credit bureaus can share personal information. Putting restrictions on "clearly impermissible uses" of consumer data could allow the FTC to maintain its traditional enforcement role in data privacy and protection, while allowing private sector innovators to pursue big data applications. "The FTC should remain vigilant for deceptive and unfair uses of big data, but should avoid preemptive action that could preclude entire future industries," she said.
The stakes are high, at least from a public policy standpoint, Groves said. "I firmly believe that the country that's able to fashion a privacy environment and a statistical and computer science environment that allows the country to learn how these data can inform multiple big policy issues will be the country that wins in the end," he said.