NARA's skills gap

Report suggests records agency needs more IT expertise

The National Archives and Records Administration lacks the technical experience necessary to find a way to deal with the growing number of electronic records created across the federal government, a new report warns.

Having long served as the government's primary custodian of paper documents, NARA does not have the IT know-how needed to understand the management of electronic records, according to an interim report by the National Academies' Computer Science and Telecommunications Board.

To many at NARA, in short, a records-keeping system still means a file cabinet.

"In addition to needing a quick ramp-up in the IT expertise necessary to oversee the early phases of procurement, NARA faces a longer-term need for a more pervasive culture change — IT skills related to preservation will need to be a core competence throughout the organization," according to the report.

"One of the biggest challenges NARA is going to face is developing that capability internally," said Julie Gable, a records management consultant based in Wyndmoor, Penn.

The program management office has known about the skills gap for some time, but as officials rework the management plan and schedule for the electronic records program, they continue to identify more areas in which the office needs internal technical expertise, said Kenneth Thibodeau, director of NARA's Electronic Records Archives (ERA) program. This includes areas such as enterprise architecture and information security, he said.

NARA is working to develop a way to get a handle on the number of electronic records being created and stored. The ERA office requested that NARA review the situation to get a broader look at challenges and potential solutions to long-term digital record-keeping. The full report is due at year's end.

Most of the recommendations confirm the direction in which the program is already headed, Thibodeau said. Considering the problem's complexity, that is good news because it means that the agency has not wasted the years it has taken to get to this point, he said.

"I don't see any major change in direction" because of the recommendations, Thibodeau said.

In fact, the report could provide "excellent ammunition" for the program and the agency to use against opponents of the plan or its funding, said John Phillips, an electronic records management consultant with Knoxville, Tenn.-based Information Technology Decisions.

Long-term electronic archiving will require technology to store the records, but just as important, it will also mean that people must be able to access them. That sounds easy, but the problem is that the records must be accessed in their original format and context or they could lose a good deal of their value.

The access problem is an issue archivists and technology experts still struggle to address. Although recent advances in technology, such as Extensible Markup Language (XML) and PDF, have made document integrity more reliable, they still cannot begin to handle the challenge of how to access those documents years from now, when the current technology has completely transformed.

Beyond the ERA program, many efforts are under way to address aspects of the problem. Two basic recommendations from the board are for NARA to get more actively involved in all of the efforts and for the government to pool its resources to address common needs.

NARA is already doing this to some degree, and during the last year has stepped up its involvement with other agencies and groups. As officials have reached out to larger digital records communities — such as academic libraries — they are finding that many others are addressing parts of the problems the archives agency faces, Thibodeau said.

The federal CIO Council's XML Working Group is now starting to work with NARA and other records management organizations to take advantage of the technology's possibilities for describing electronic records, an important factor in long-term archiving, said Owen Ambur, co-chairman of the XML Working Group and former vice chairman of the Federal Information and Records Managers Council.

The overall solution, however, should not rely too much on XML, Phillip said. "It is an important tool and will definitely be an important part of the persistent archives. But they need to not assume that everything will be solved by that," he said.

The report encourages NARA to move forward with a careful, modular procurement, starting with many small pilot programs to address focused aspects of the overall problem, and then pulling together the pieces farther down the line.

The recommendation matches NARA plans, Gable said. At the same time, however, pilot projects will pose their own problems if NARA is not careful about the inclination to develop stovepiped systems, she said.

"The idea of developing a small number of focused pilot systems and then thinking that all those pilot systems will converge eventually...Maybe in the future, but with current technology, I think that's naive," she said.

Enterprise architecture and planning will play a role in this area, Ambur said.

The physical size of the archiving task is daunting enough. By 2014, NARA expects to receive almost 11 petabytes worth of records from agencies.

Although it was a bit of a surprise to NARA officials, considering the forward-looking solutions they seek, the agency received more than 30 responses to its November 2002 request for information, and many of those companies, including many top federal integrators, came in to brief the program office throughout May, Thibodeau said. "Some of these companies see an emerging market here," he said.

The number of commercial products is "highly encouraging," Ambur said. "I think vendors are beginning to realize that records management isn't something that they can ignore anymore."

This response has been building during the last few years, Gable said. The software has improved and the integrators have started to step in to create enterprise applications and incorporate them into agencies' existing systems.

"In this technology market, NARA represents megabucks, and you would be a very remiss integrator if you didn't get in some kind of proposal," she said.

NARA will likely issue the final request for proposals for the electronic archiving system later this year, but with more than 700 requirements for the entire program, a lot of planning must be done before that point, Thibodeau said.

***

The National Academies' Computer Science and Telecommunications Board outlined several recommendations for the National Archives and Records Administration in its interim report on the Electronic Records Archives program.

The National Academies' recommendations:

* Work with other archiving programs and organizations on common digital preservation needs.

* Gather more information about the electronic records that need to be preserved before moving forward with a modular procurement.

* Address the lack of information technology expertise within the National Archives and Records Administration by training current employees, hiring new ones with specialized skills and contracting out for support.

* Start with a small number of focused pilot programs that will eventually converge into a comprehensive system.

***

Why e-records are a problem

The Electronic Records Archives (ERA) program is the government's long-term initiative to preserve and access digital records as they are originally stored. There are many challenges related to this proposal.

Changing formats: Agencies already are struggling to access records stored in old versions of applications, such as Microsoft Corp.'s Word. The National Archives and Records Administration is trying to anticipate or account for the complete transformation of technology more than 100 years in the future.

Technology changes: Hardware changes quickly. For example, few computers today can access data stored on 5.25-inch floppy disks.

Volume: As records are being transferred to NARA, officials have discovered greater-than-expected volumes of data. The Clinton White House generated approximately 40 million e-mail messages in only one of its systems instead of the expected total of 10 million messages. Furthermore, the volume of e-records the ERA program expects to receive from agencies between 2005 and 2010 is estimated to make up almost a petabyte of data, and the total volume is expected to increase to 10.7 petabytes by 2014. One petabyte equals 1,024 terabytes, which is usually the largest term agencies have to use to describe their data volumes.

E-record definition: NARA and other archivists are still struggling to define what is necessary to maintain as an e-record, such as e-mail, multiple revisions of the same document and images.