IT tools can help declassification backlog, but is there funding?
Three pilot projects designed to prove out the software-based analysis concept have convinced members of the Public Interest Declassification Board that some level of automation can be introduced into the declassification process.
(Image: elnavegante / Shutterstock)
The federal government is facing a mounting pile of electronic documents and other material that are due to be reviewed for declassification. But there just aren't enough people and enough budget to meet the statutory deadlines.
A pilot program funded by the CIA in collaboration with the National Archives and Records Administration examined whether the use of automated tools could speed the process by performing the initial analysis of documents, and making recommendations for release, redaction or reclassification to human analysts.
The answer, so far, appears to be yes.
Three pilot projects designed to prove out the software-based analysis concept conducted at the Center for Content Understanding at the Applied Research Laboratories at the University of Texas have convinced members of the Public Interest Declassification Board and leaders at NARA that some level of automation can be introduced into the declassification process.
"Automation can help humans work more efficiently by drawing attention to critical questions and highlighting items that it would take people a long time to scan for in documents. It can also make humans more effective by bringing to bear external information," said Cheryl Martin, a research scientist who is leading the pilots at the Applied Research Laboratory, at a June 25 meeting of the Public Interest Declassification Board.
Martin's group created a tool called Sensitive Content Identification and Marking (SCIM) out of different pieces of open source software that have machine learning and unstructured data processing functions. An initial pilot project looked at classification decisions, and proved out the viability of computer-generated decision support. But that effort also pointed to the need for better guidance and rules for making decisions about when to classify materials.
"Classification guidance is written to be interpreted by humans. It often lacks the specificity and the precision that a computer needs to make a determination," Martin said. She noted that often professionals were not able to adequately explain how they made certain decisions.
"Subject matter experts know how to classify. But like most people who know how to do things, they just know," Martin said. Ultimately this pilot ended when Martin's group ran out of classified material – such as published articles from internal CIA journals -- that was deemed appropriately sensitive for this kind of analysis.
Creating chaos?
In public comments at the end of the meeting, Steven Aftergood of the Federation of American Scientists said that the drive toward automation should be accompanied by a rewrite of classification guidelines.
"If we have vague and confusing guidance of the kind that I think we do have today, then automating its application at this point would create chaos," Aftergood said.
Another pilot that performed quality assurance checks on manually reviewed and redacted declassified documents showed that it was possible, using a combination keyword selection, contextual understanding, and rules, to whittle down a large number of documents to a small, manageable number for human review.
Martin's team is wrapping up work on a third pilot, using the still-classified Reagan Presidential email archive as a test case. The White House then used the IBM Professional Office System, which allowed for messages to be sent between terminals operating on the same mainframe. The National Security Council staff was an early adopter of PROFS.
The messages, in their archival format, were extremely difficult for people to read. They were preserved on a backup tape from the late 1980s, and appeared in a single bitstream, making it hard to tell when one message ended and another began. The hope with the Reagan emails is to scan for "multiple agency equities" -- things that different organizations wish to keep secret -- that are contained within single documents. The White House emails present an ideal case because they contain synthesized information from a variety of sources.
"If we can identify those equities accurately by automation, we can potentially make the referral process better and faster. And once the sensitive information is identified, this could be passed along to the individual agencies and help speed up their review process as well," Martin said.
The Reagan emails, reformatted as individual documents, are undergoing a manual review under the standard declassification procedure, to see how the automated efforts track against the old-fashioned method. This fall, Martin will be able to validate SCIM's work against human reviewers. Their money runs out at the end of 2015, so unless the project is funded and scaled up, the pilots will be the end of the story.
The Public Interest Declassification Board is advocating an automated approach. Board Chair Nancy Soderberg, a senior foreign policy advisor during the Clinton administration said "it's less risky to use technology because humans make mistakes. Machines can do this ad infinitum, and we get tired."
John Fitzpatrick, director of the Information Security Oversight Office at NARA and executive secretary of the PIDB, said he hopes to make the case in Congress that integrating automated tools into the declassification process deserves funding.