Digging digital gold

From the Internal Revenue Service, which collects money, to the Defense Department, which spends a lot of it, government agencies are turning to an advanced form of computer analysis called data mining to uncover fraud, keep better track of supplies and improve budget forecasting.

From the Internal Revenue Service, which collects money, to the Defense

Department, which spends a lot of it, government agencies are turning to

an advanced form of computer analysis called data mining to uncover fraud,

keep better track of supplies and improve budget forecasting.

Adopting the same techniques that private-sector marketers have developed

to track consumer spending and predict what customers will buy, agencies

are using computers to sift through vast amounts of data to uncover hidden

patterns that might indicate where fraud or inefficiency is occurring.

Eventually, data mining experts say, the technique may be used for purposes

such as improving aircraft safety, producing better drugs and securing the

Internet.

Mining for Fraud

The Defense Finance and Accounting Service, which pays billions of dollars

worth of military bills each year, is a leader in the data mining field.

DFAS is testing data mining as a way to discover billing errors and fraud.

In a test that began in November and continues to July, DFAS' vendor

pay branch uses data mining to search through 2.5 million financial transactions

that may indicate inaccurate charges. Computers use data mining software

to screen each transaction for 80 different elements, from what was bought

and at what price to how it compares with previous purchases.

Although the test isn't completed, the effort so far has pointed out

several hundred bills that may warrant further investigation, said vendor

pay branch chief David Riney.

An earlier DFAS data mining test focused on government purchase cards,

which government employees use to buy airline tickets, rent cars, and pay

for hotel bills and meals. In some agencies, employees use the cards to

make 80 percent to 90 percent of office supply purchases that are less than

$2,500.

The problem is how to pick fraudulent transactions out of the millions

transactions that DOD processes each year. "In the past, we have relied

on tipsters" to point out fraud, Riney said.

Using SPSS Inc.'s Clementine software, the agency

searched 125,000 transactions made on 40,000 purchase card accounts. In

addition to examining the obvious — payment amount, the date and time the

purchase was made and the type of vendor — computers delve into cardholder

information, account transaction limits, billing cycles and purchase histories.

As the computer searches the transactions, data patterns that might

indicate improper use emerge — such as purchases made on weekends and holidays,

entertainment expenses, unusually frequent purchases, multiple purchases

from a single vendor and other transactions that do not correspond to the

agency's past purchasing patterns.

In its data mining test, DFAS turned up a cluster of 345 cardholders

who had made suspicious purchases, some of whom are still under investigation.

But the process needs some fine-tuning, Riney said. For example, purchases

of golf equipment seemed suspicious until investigators learned that a military

recreation manager had authority to buy the equipment. And expenses the

computer said were charged to a "casino" turned out to be an ordinary hotel

bill.

Nevertheless, the data mining results have been promising enough that

Riney predicted data mining will become a regular part of DFAS efforts to

stop fraud.

Finding Patterns

Indeed, numerous agencies have begun to pick through databases for information

that can improve agency operations. "Running the business of government

is where we see the growth in data mining," said Mark Battaglia, executive

vice president for marketing at SPSS Inc., the nation's largest data mining

software company.

The Army has used data mining to try to identify sources of delay in

its "order and ship" process of delivering supplies to overseas bases. NASA

has considered using predictive data mining to search aircraft maintenance

and mishap data for factors that might predict accidents.

Also, the Federal Aviation Administration has hired Mitre Corp. to find

ways it can mine aircraft accident data for clues about their causes and

how those clues could help prevent future crashes. Already, Mitre has found

that planes equipped with instrument displays that can be read without requiring

a pilot to look away from the windshield were damaged less in runway accidents

than planes without them.

But the government is cautious about committing much money to data mining.

"One of the problems is how do you prove that you kept the plane from falling

out of the sky," said Trish Carbone, a technology manager at Mitre.

Data mining also can be used to improve computer security, said Kristin

Nauta, data mining program manager at SAS in Cary, N.C. Mining network logs

could uncover patterns of intrusion that system operators could not detect

in other ways. Data mining could also point out holes in computer security

systems that let intruders enter, she said.

For the IRS, data mining is a way to improve customer service, said

IRS data mining specialist Ester Brook-Jones. By analyzing incoming requests

for help and information, the IRS hopes to be able to schedule its work

force better to provide faster, more accurate answers to taxpayers' questions.

For the past year, the Department of Veterans Affairs has been using

data mining to predict demographic changes among its 3.6 million patients

and project collections from insurance companies. The technology enables

the VA to send Congress more accurate budget requests, said Robert Hinson,

the VA's director of communications and special studies services.

Agencies such as the VA, which spends about $19 billion annually to

provide medical care to veterans, are under increasing pressure to show

that they are operating efficiently. For many, data mining is becoming the

tool of choice to highlight good performance or dig out waste.

"Data mining is excellent at detecting patterns where things might not

be working right," Battaglia said, whether it is multiple Social Security

checks going to different names at the same address, or an unusual pattern

of Medicare billings by a doctor.

The potential for savings through data mining is enormous, said Herb

Edelstein, president of Two Crows Corp., a Potomac, Md., data mining company.

Consider government pensioners. Including Social Security recipients,

retired military personnel and retired government workers, about 10 million

Americans receive government pension payments. It is not uncommon when they

die for their pension payments to continue. Even if the system were 99.9

percent accurate at stopping checks for those who die, 10,000 payments would

still be mailed to deceased recipients, Edelstein said.

By using data mining to analyze data the government already has on pension

recipients — age, health and other factors — it is possible to determine

those who are most likely to have died. Pension administrators then know

which recipients to check to ensure they are still alive. If the average

pensioner receives $10,000 per year, eliminating payments to retirees who

have died could save the government $100 million per year.

Although some agencies are veteran data miners — such as the National

Institutes of Health, which drills into databases to learn how well medical

treatments work — other agencies have just begun to sift through the mounds

of stored data.

The Navy, for example, recently established a data warehouse to manage

the distribution of torpedoes throughout the Pacific submarine fleet, said

Rear Adm. Charles Munns, the U.S. Pacific Fleet's deputy chief of staff

for command and control and requirements and resources.

"We have to make sure the right torpedo gets to the right ship, and

now we know at the command level where all those torpedoes are," he said.

The Navy also uses data mining to better manage logistics. Using an

Oracle Corp. database system to keep track of parts and spares enabled the

command to forego its annual 12-person visits to submarines for a hands-on

inventory.

Steve Petchon, a partner at Andersen Consulting who works with federal

agencies, said military logistics operations are a logical application for

data mining. In the private sector, companies such as UPS and Federal Express

rely on data mining to keep track of the goods they ship. But to succeed

at data mining, Petchon said, the military will have to replace its stovepiped

data systems so that data can be collected in a repository where multiple

users can get to it easily.

Privacy Boundaries

Other possible uses for data mining by the federal government might

include predicting which job candidates are most likely to succeed if hired,

and which government workers seeking security clearances are least likely

to commit security violations, Edelstein said.

But proposing those sort of uses for data mining triggers alarms among

privacy advocates, who warn that data mining poses a serious threat to privacy.

It is not fair to rely on statistics to predict who will or won't be

a successful employee, said Andrew Shen, a policy analyst at the Center

for Democracy and Technology. "This sort of technology implies a cookie

cutter mold that has a lot of flaws. What happens if there's a mistake in

the data?"

To Shen and others, the increasing ability of individuals, government

agencies and businesses to tap databases to compile extensive collections

of personal information raises the peril of compromising personal privacy.

"Until now, we have been able to have privacy through obscurity," said

Beth Givens, director of the Privacy Rights Clearinghouse in San Diego.

Personal data has long been collected and maintained as public records in

file cabinets in courthouses, tax offices, city halls and federal office

buildings. It wasn't easy to find and it was harder to aggregate.

That information is now easily accessible via the Internet or in databases.

Records of divorces, bankruptcies and property ownership and applications

for professional licenses are being stored in digital format. By compiling

them, it is often possible to construct a detailed profile of an individual.

Combine that with the data amassed by credit card companies, catalogue sales

outlets, phone companies, banks, Internet companies and even supermarkets,

and a vast dossier of personal details can be compiled about most people.

That is valuable information to marketers peddling goods and services

ranging from insurance to vacation properties. Such data also may be valuable

to law enforcement agencies whose previous information collecting was limited

by the need for warrants, Givens said.

"If we feel uncomfortable about that, we should have a public discussion

about the information being collected and who it gets access to it," said

Givens, whose clearinghouse crusades for protecting privacy.

The trend is in the other direction, however.

The use of data mining to delve deeply into databases is expected to

increase at a rate of 50 percent to 60 percent per year.

In government, the push to cut costs by reducing staffs will only encourage

more data mining because the technology enables fewer people to manage data

better, Battaglia said.

NEXT STORY: VA's time line