Data mining: The new weapon in the war on terrorism?

The use of the technology to probe vast amounts of phone data could be costly and invade privacy.

If the government is analyzing Americans’ phone records to discover and track terrorist networks — or ever plans to do so — the requisite technology would cost a lot of money, demand considerable computing power and raise privacy issues, observers say.

The possibility that the government is sifting through tens of millions of phone records came to the public’s attention earlier this month after USA Today reported that the National Security Agency had collected records from AT&T, Verizon and BellSouth.

Although it is unknown if the government is probing phone records for national security purposes, the possibility shines a spotlight on the potential benefits and drawbacks of a sophisticated technology that few people fully understand.

That technology is data mining, or extracting knowledge from a vast amount of data. The technique requires super fast computers and software capable of performing complex algorithms, experts say.

Nathan Hoskin, chief architect of Planning Systems, a data analysis and engineering systems developer, said the government would need supercomputers “on the scale of Blue Gene or Columbia — or you could also create what amounts to a supercomputer out of hundreds or thousands of regular PCs.”

The development of a data-mining system that could analyze U.S. phone data would cost somewhere in the range of $20 million to $50 million, added Hoskin, whose company has worked with federal agencies.

If telecommunications companies hand over their records, three kinds of algorithms might be helpful in investigating potential terrorist cells: clustering algorithms, link analysis and association rule mining.

The first — clustering algorithms — focuses on pieces of data that are similar to one another. The second — link analysis — attempts to connect the dots among disparate datasets, such as terrorist conspirators scattered worldwide.

“Terrorists are smart enough to know that if ‘Al’ and ‘Joe’ are both known criminals, they can’t talk directly without attracting law enforcement’s attention,” said Hoskin, who has worked on data analysis and data-mining projects for corporations such as Equifax and Enron during his 25-year career. “With link analysis algorithms, you can start looking for common sets of paths [or] routes.”

For example, intelligence officials might be able to identify a terrorist cell leader by tracing call routes. The algorithm might show that a Texas-based terrorist who attacked a facility in Austin had previously communicated with a conspirator in Oklahoma City, who had spoken with a co-conspirator in Boston, who in turn had been in touch with someone in Spain, and on and on, until the call route stopped in Pakistan. Then the officials may target the Pakistani caller as a possible cell leader.

But this approach can produce meaningless data because it becomes harder for the link analysis to connect the dots once the route extends five or six hops, Hoskin said.

The third method — association rule mining — looks for patterns within data. If every time Al gets a call from Oklahoma City he then immediately calls Pakistan, the algorithm associates calls originating in Oklahoma City with the country Pakistan. The association may raise red flags for intelligence officials.

Computer programs can combine all of those algorithms, too. If the composite picture points to the same person, the government could decide to probe every contact that person has called in the past few years.

Hoskin said he thinks the government would be reluctant to delve into this sort of personal information until the data mining produces convincing evidence.

“If I was an agent of the government, it wouldn’t be until the point that something had really piqued my interest that I’d say… ‘Do a lookup on this number and find all the people associated with it,’” he said.

But privacy advocates say mining phone records could produce a mountain of civil rights violations without ever generating one lead.

Jay Stanley, public education director of the American Civil Liberties Union’s technology and liberty program, said intelligence work could easily creep from mining to wiretapping and other modes of surveillance. “We have to expect that anybody that gets flagged by one tool, like this telephone records database, would find themselves subject to the National Security Agency’s other spying tools, whatever those might be.”

Critics say the possible data-mining initiative resembles the Defense Department’s scrapped Total Information Awareness program, which was envisioned as a way to anticipate potential terrorist attacks by analyzing patterns from a massive and wide-ranging database of electronic information.

“There’s a lot of evidence that the National Security Agency is engaging in data-mining activities that do bear some resemblance to the TIA program,” Stanley said. “I think one of the primary questions that Congress needs to investigate is to what extent they are engaging in TIA-like activities by sharing private phone records.”

Even if phone companies are not giving out personal identifiers — customers’ names, street addresses and other personal information — the government can obtain personal information from a phone number via other databases and services, according to data-mining experts.

“It would take a large bank, much less the National Security Agency, about 10 minutes to assign names to all those phone numbers,” Stanley added.

Earlier this month, a federal auditor testified to the House Judiciary Committee Commercial and Administrative Law Subcommittee that agencies had failed to comply with data-mining protocols as recently as August 2005.

“Increased use by federal agencies of data mining — the analysis of large amounts of data to uncover hidden patterns and relationships — has been accompanied by uncertainty regarding privacy requirements and oversight of such systems,” said Linda Koontz, information management issues director at the Government Accountability Office, testifying before the subcommittee.

“As we reported in previous work, the result was that although agencies employing data mining took many steps needed to protect privacy, such as issuing public notices, none followed all key procedures, such as including in these notices the intended uses of personal information,” she said.

In comparing wiretapping to looking at phone records, observers say both pose threats to Americans’ privacy.

“Listening to the content of calls is more intrusive, but nobody should underestimate the privacy invasion that’s involved in tracing who’s talking to whom,” Stanley said. He added that the effort could expose innocent citizens’ calls to therapists, lovers and hot lines.

“People have the implicit expectation that the list of people they call will not be shared with their neighbors or the government,” Stanley said.

Mining phone records to find terrorists could be a waste of time, akin to tagging the entire U.S. population as a possible suspect, he said.

“Most of the successes we’ve seen in the national security area seem to be old-fashioned, stick-to-the-basics investigative work…start from known leads and work outward,” Stanley said.

Mining phone data

Data-mining expert Nathan Hoskin, who has worked on data analysis projects for corporations such as Equifax and Enron during his 25-year career, said the government is probably interested in two kinds of data that telecommunications companies collect: billing information with call logs and fees, and proprietary analyses of their networks’ quality performance.

To provide better service and maximize revenue, telecom companies monitor the types of phone technologies in use — such as voice over IP, cellular and landline — the frequency of each system’s use and the costs of operating each system, Hoskin said. Those measurements can pinpoint overloaded switches and inform phone companies of sites that need increased capacity.

Because terrorists are not likely to use phones registered to them or even consistently use the same phone, the second set of data can reveal information that billing data cannot, Hoskin said. For example, network analyses can show where a call originated, the length of the call, the technology that supported the connection and the quality of the connection.

The government can blend both sets of data for even more clues.

“The terrorists who are looking to do harm, as a safe bet, can assume they are being watched at this point,” Hoskin said. “So they are going to try to find ways, like a raccoon, to cover [their] scent.”

NEXT STORY: Davis asks OMB to act against TCE