Storage dilemma looms

San Francisco this week slammed the door on a Marine Corps proposal to use neighboring national parklands as the site for an major military exercise designed to test cuttingedge information technologies in an urban environment.

As the federal government closes in on fixing computer systems for the Year 2000 bug, another information technology crisis is building: whether all the data that federal computers are generating can be preserved into the next century.

A white paper published last month by a team of NASA scientists warns that data storage technology is not keeping pace with the enormous volume of data created by government agencies and the private sector. As a result, terabytes of information, including weather data, weapons schematics and financial transactions, are at risk of being lost forever.

"We know paper lasts for hundreds of years," said Milton Halem, chief of the Earth and Space Data Computer Division at NASA's Goddard Space Flight Center and the primary author of the paper. "We have satellite observations that we need to save for more than 100 years...for their potential value to scientific and societal applications."

The main culprit is data transfer rate—the amount of time it takes to move data from an old tape to a new one. The data transfer speeds of computers have only quadrupled in the past decade, while the capacity of storage tapes has increased 100-fold.

For example, it will take four years for NASA's Center for Computational Sciences to copy 28 terabytes—more than 28 trillion bytes—of data to higher-capacity tapes. But by the time the project is done, the new tapes will have only six years of life expectancy left, so NASA will have to quickly start copying the tapes all over again. Meanwhile, the system will have taken in more than 70 terabytes of new data, which must be stored as well.

The task is multiplied for all of NASA, which holds thousands of terabytes of data from space telescope observations, mission simulations and aeronautical data. "We're being presented these problems well before other agencies,'' said Lee Holcomb, NASA's chief information officer. Nevertheless, he added, "I think the issues we deal with are very common to everyone else.''

Other technology shortcomings include problems with backward compatibility between new storage systems and older components as well as inadequate means to keep data secure.

Tom Engel, manager of high-performance systems with the National Center for Atmospheric Research (NCAR), a federally funded weather research facility in Boulder, Colo., said he is trying to restore data from seven-track tapes, a technology that was popular 20 years ago but is no longer manufactured. "We had a heck of a time finding a seven-track tape reader,'' he said, adding that the same fate could befall current storage media.

Fred Rybczynski, a product manager with Storage Technology Corp., which supplies tape drives to the government, said agencies should ask their storage vendors what they plan for the storage products that the agencies use so that the agencies can plan for the phase-out of old products. "In the past, there hasn't been such a strategy,'' he said. Rather, the industry has changed direction abruptly.

StorageTek recently began shipping a faster drive designed to work with multiple computing platforms. The company has mapped out three generations of the product for customers.

Money to cover storage costs also is a looming issue. According to current projections, Engel said, by 2005 it will cost $40 million to $50 million to store all the data NCAR collects, an amount that exceeds the cost of new supercomputers to analyze it. The budget does not include copying what will be about 500 terabytes of information to new media as old tapes wear out. "We are attempting to put capability in place [so we do not] have to migrate all the data,'' he said.

Agencies also worry about whether they will be able to retrieve what they save. "The issue for many groups is how to migrate and capture the structure of the data and its meaning,'' said Reagan Moore, associate director for enabling technologies at the San Diego Supercomputing Center at the University of California, San Diego.

"Information access is all-important,'' said Jack Cole, a lead adviser on storage issues with the Army Research Laboratory, Aberdeen, Md., which operates supercomputers used to model battlefield exercises, weaponry and other military research.

But with "no real firm standards" governing software for high-capacity storage systems, one cannot necessarily read data recorded on different systems.

With about 210 terabytes of data in storage, ARL is not "in a bind" right now, Cole said, but, like other labs, ARL expects its archives to grow. Last month, the office that runs the Defense Department's High-Performance Computing Modernization Program convened a committee to define common storage requirements for its supercomputing centers. The goal is to ensure that data collected by one center can be used by others.

Halem agreed that maintaining applications and formats is important, but he called it "a higher-order problem." He said the industry first has to figure out how to keep all the information. His paper calls for the government to fund a research test bed that firms could use to ensure their products can be integrated with others on the market. He also suggests agency CIOs get a handle on the storage problem by collecting annual reports on data growth.