Future electronic records archives bet on XML

The survivability of digital information over time is the biggest challenge the National Archives and Records Administration faces

The survivability of digital information over time, particularly as agencies rely less on paper and more on electronic formats, is the biggest challenge the National Archives and Records Administration faces. How can it ensure that documents will be readable by future generations when the software used to store those files will likely no longer be used?

NARA appears to have found an answer in the electronic archives program — an estimated $130 million project that will use XML technology to help ensure that all documents can be read without needing the software program that produced them. "With XML, it doesn't matter where the technology goes," said Kenneth Thibodeau, director of NARA's Electronic Records Archives Program. "The XML tools are simple enough that future computers should be able to deal with [the data]."

As part of the electronic archives program, electronic documents will be converted into XML and given XML tags that describe elements of the document such as a name or Social Security number. Document type definitions will describe the content and structure of a document, and style sheets will describe how a particular document is to be formatted. The records will then be stored on a tape cartridge, which in turn will be stored in a type of data warehouse.

NARA plans to take XML even a step further. "As it turns out, to do what the Archives needs to do to deliver authentic records over time, researchers determined that document type definitions and style sheets are not sufficient," Thibodeau said. "What they're exploring is XML topic maps as a way to represent the knowledge we have and need to communicate."

XML topic maps will help the Archives connect records with agencies' business processes and to search and mine the data later. "To the extent you're keeping government records, you need to be able to link the records with the original activity," Thibodeau said. "You can impose any number of topic maps on the same body of information. We know that the ability to mine the records using this technology will be very helpful for us in producing the descriptions the citizens use to find out [which] government records might have government information in them."

Thibodeau anticipates that agencies will already be using XML for business when the electronic archives project is operating, about four years from now.