Saving it for the future
Agencies wrestle with 'full cost of digital preservation'
Some agencies may believe they finally have a handle on producing e-mail, Web sites and other digital documents. But unless those agencies have also figured out how to store them—for a year or a generation—they risk violating federal archiving regulations.
About 150 federal archivists, librarians, computer scientists, online services vendors and others gathered recently in Washington, D.C., for the Federal Library and Information Center Committee's 2001 Forum to discuss the government's scattered stabs at the problem—and the Library of Congress' new $100 million legislative effort to provide a blueprint for preserving the government's digital heritage.
"Information professionals believe it's about time for the federal government to acknowledge the full cost of digital preservation," said Susan Tarr, executive director of FLICC. "I don't think any of the big Cabinet-level agencies has really focused on this."
Dubbed the National Digital Information Infrastructure and Preservation Program, the LOC project will bring together representatives from the White House Office of Science and Technology, Commerce Department, National Archives and Records Administration and other agencies. Led by the librarian of Congress, James Billington, participants will begin to standardize policies and procedures for collecting, storing and indexing digital material.
"We will be putting together a plan and be going back to Congress for their approval," said Laura Campbell, LOC's associate librarian for strategic initiatives. "They wanted us to lay out a plan and come back with something that is reasonable."
The program provides $5 million immediately for planning and preserving digital troves that may vanish before the plans go into effect.
"Even in our offices, we are losing [e-mail] messages at an enormous rate," said Sen. Ted Stevens (R-Alaska), a forum speaker and sponsor of the legislation. If this continues, he said, it will hamper future citizens' ability to know "what went on in this democracy in this period."
After Congress approves the LOC plan, $20 million will be added to the $5 million already available to put it into effect, with the final $75 million available over the next two years as matching grants to nonfederal donations.
The project should shape national standards for preserving and retrieving digital materials, said Deanna Marcum, president of the Council on Library and Information Resources, a nonfederal organization that will partner with LOC.
For example, NASA collects troves of digital satellite imagery and information that need to be preserved, she said. "How are we going to be able to retrieve that information so that scientists will be able to make sense of that data years from now?"
Electronic journals, academic and research journals that are "born digital"—edited and issued electronically, with no paper counterpart—also present challenges, Marcum said. Unless such journals are put into archival repositories and moved to new systems when platforms change, they risk being lost, she said. And that violates a basic tenet libraries follow.
"Libraries have a unique responsibility to transfer materials to the next generation and the generation after that," Marcum said.
"E-journals and e-books are going to be an increasing issue for libraries, as well as other scholarly materials," such as dissertations and theses, said Taylor Surface, manager of distributed systems at the Online Computer Library Center Inc., another organization involved in the LOC project.
Although federal law is now interpreted to require the preservation of many electronic records along with paper documents, no federal standards exist to make this happen. Instead, several agencies are grappling with the problem.
As the ultimate caretakers of much of this data, National Archives and Records Administration officials are perhaps furthest along in thinking about how to tag records so that they can be found later.
As digital documents pile up into the billions, keyword searches become useless. Even a narrow search of the recently archived Clinton White House e-mail can return tens of thousands of hits, said Lewis Bellardo, deputy archivist of the United States.
It will become critical to infuse documents with indexing information as they are created, Bellardo told FLICC forum attendees. All arms of the government must work with NARA to modify the way they generate information, he said.
Still, not every record needs to be ar-chived, and others change over time. Officials at the National Library of Medicine have created a way to classify federal documents' longevity and changeability. NLM's "permanence rating" system tells whether a digital record will be eternally available, whether it will have a constant Web address and whether it will change incrementally, radically or not at all.
It took a year to develop the scheme; NLM officials hope to put it into practice soon.
How tough is it to implement a departmentwide digital preservation plan? Agriculture Department officials have been working on the issue since 1997, but a set of draft guidelines still await their chief information officer's approval.
"We've had a certain amount of progress; perhaps not as much as we would have liked, but we're moving ahead," National Agricultural Library Director Pamela Andre told attendees.
At that rate, USDA officials could find themselves overtaken by LOC's project. But when it comes to setting standards for the entire government, Stevens is satisfied with a deliberate pace.
"I think we could make big mistakes if we move too early," the senator said. "If I have my way, we'll wait until Dr. Billington completes his work."
Peniston is freelance writer based in Washington, D.C.
NEXT STORY: Marines C4 staff leaves shared office