Archives to scale volumes of snapshots

NARA must determine how to store and search through 21 terabytes of agency Web site data

When the National Archives completes the task of collecting "snapshots" of all federal Web sites, it will have to figure out how to store and search through 21 terabytes of digital information.

"We're not scaled to do that now. We will have to build up the capacity to handle it," said Mike Miller, director of the Archives' modern records program.

The snapshots were ordered by the outgoing Clinton administration to preserve archival copies of federal Web sites as they existed Jan. 20. Senior Clinton officials said they wanted a record of the electronic government developed during their watch.

To archivists, the snapshots have a less specific, but perhaps greater worth.

"We save these things for one reason and find that people find tons of ways to use them," Miller said. He said, for example, accounting records captured during the collapse of Nazi Germany sat largely unused for about a half century, but in recent years they have become valuable for tracing looted gold and treasure.

The snapshots of government Web sites are also certain to prove valuable, he said.

Some agencies may find them useful in settling legal disputes. Researchers will no doubt find them valuable for tracing the early development of electronic government.

"We felt we would be kicking ourselves if we did not" take the snapshots, Miller said. So far, 38 agencies, mainly small ones, have sent Web snapshots to the Archives, Miller said Feb. 16. There are at least three times that many federal agencies. The deadline is March 20.

Agencies must capture the Web site as it appeared Jan. 20, complete with working links between the site's pages and layers. Snapshots are being sent to the Archives on CD-ROMs or tape and eventually are to be transferred to digital linear tape for long-term storage.

If printed on paper, the 21 terabytes of Web data would be roughly double the amount of information contained in the Library of Congress' collection of 20 million volumes.

Because of the volume of data involved, the Archives does not want to make a practice of periodically collecting agency Web site snapshots. "We want to get this on a more regularized basis," Miller said. The record-keeping agency hopes to have new guidelines in place next month instructing agency Web managers on how to routinely preserve Web site records.

NEXT STORY: FEMA map service inches forward