Where oh where does all the data go?

GAO study provides Congress with a look at info that agencies collect.

General Accounting Office officials have given Congress an extensive report on the federal government's use of data mining. Lawmakers must weigh the potential dangers exposed in the report.

Sen. Daniel Akaka (D-Hawaii), who requested the GAO study, is considering whether to propose legislation to curb privacy abuses that can occur in data mining when unclassified computer data is shared and analyzed.

For the study, GAO auditors reviewed data mining that relies on statistical modeling techniques. The auditors defined data mining as a tool for helping analysts discover hidden patterns in data and make predictions based on those patterns. The 64-page report did not examine data-mining activities that are classified.

In a statement about the GAO findings, Akaka remarked that "the breadth of data-mining activities across the federal government involving personal information demonstrates the need for policies and safeguards."

At Akaka's request, GAO officials have begun a follow-up study that will disclose more details of federal data-mining activities. The second study could take a year to complete. Meanwhile, lawmakers and citizens are trying to gain a deeper understanding about whether they should fear data mining.

Some of the dangers associated with data mining were publicized in 2003, when Congress cut off funding to the Defense Advanced Research Projects Agency for a project known as Total Information Awareness. After the Sept. 11, 2001, terrorist attacks, researchers began experimenting with

pattern-based queries across many large databases. They were hoping to discover a sequence of activities that might reveal a hidden terrorist plot in the making.

The research required government officials to rake through huge amounts of data held in public and commercial databases. Some experts compared it to searching for a needle in a haystack. Privacy advocates protested, citing the likelihood that innocent people would be mistaken for terrorists. Lawmakers quickly pulled the plug on the DARPA project.

Privacy groups have complained about the flawed nature of such data-mining efforts. Because terrorism attacks are statistically rare, it is difficult to use pattern-based queries to predict the next attack, said Lara Flint, staff counsel for the Center for Democracy and Technology, an advocacy group for civil liberties. "Academic researchers who are working on this kind of thing would tell you we're years away from doing this in an effective way — if we ever get there," she said.

Flint said the recent GAO report shows the need for more congressional oversight.

At the same time, the report did give lawmakers a glimpse of a brighter side of data mining that has been eclipsed by the Total Information Awareness controversy. GAO auditors found examples of federal agencies using the pattern-matching capabilities of data mining for purposes that probably few lawmakers would find offensive.

Even Akaka noted in his written response to the GAO report that "not all data mining is necessarily invasive or violative of an individual's civil liberties."

Based on what 128 federal agencies reported to GAO, the purpose behind 65 out of a total of 199 data-mining projects is to discover ways to improve the services those agencies provide. Another 24 of the projects are designed to help agencies detect fraud, waste and abuse. And 17 data-

mining activities are helping agencies manage their human resources.

The use of personal information in data mining is what concerns lawmakers and privacy law experts such as Jeffrey Rosen, professor of law at the Georgetown University Law Center. Rosen, who spoke about data mining at a recent conference in Washington, D.C., said the privacy dangers have united two groups of citizens who normally hold opposing views — groups he described as civil-libertarian liberals and libertarian conservatives.

"It is not wrong to fear or be concerned that when the government has broad access to unregulated amounts of data that misuses could take place," he said.

Two things must happen next, Rosen said. First, software engineers must develop better data-mining technologies, and second, Congress must provide new legal protections.

Useful technologies already exist. But Rosen said additional laws and technical safeguards are necessary for guaranteeing personal privacy while giving government officials access to information they need for legitimate purposes.