NSA Turns to the Cloud to Help Manage Data Deluge
In 2010, NSA decided to pursue cloud as a “common space” where analysts could look across the agency’s entire river of collected information.
Four years ago, the National Security Agency realized it had a data problem.
In the aftermath of the Sept. 11, 2001, terrorist attacks, armed with new authorities that allowed the intelligence agency to collect an assortment of global communications and Internet metadata, NSA ramped up its operations. These expanded surveillance efforts—many of which were disclosed publicly in leaks by former NSA contractor Edward Snowden—produced a massive volume of data that began to flood the agency’s disparate databases.
By the mid-2000s, NSA was adding racks of servers to quell the voracious appetite of multiple programs, a costly effort that further burdened intelligence analysts who had to access dozens or more databases and perform individual queries against each of them. Analysts could spend 30 minutes or longer each morning just opening the databases they needed to query, a senior NSA official told Government Executive.
“What we were doing was just adding servers, and that doesn’t scale at a certain point,” the senior official says. “We had a volume issue—more than a volume issue, our analysts are swamped going in and out of repositories.”
The Best Bet
NSA officials picked up on Google research in 2007 that ultimately paved the way for the intelligence agency’s formal adoption of cloud computing in 2010—part of a broad effort within the intelligence community to more effectively share data and services. In a similar move, the CIA last year signed a contract with Amazon Web Services for a cloud to be used by all 17 intelligence agencies.
As private sector companies were beginning to turn to the cloud for cheaper, scalable on-demand computing, NSA came to consider the approach as its best bet to manage its deluge of data. In 2010, NSA decided to pursue cloud as its “repository of choice”—a common space where all analysts could look across the agency’s entire river of collected information.
“The common space is hugely important for us—and the ability to discover,” says the NSA official. “We wanted to make all data discoverable. That’s the approach we took.”
NSA makes the data discoverable through metatagging. Each piece of intelligence information in the NSA cloud is marked with telling details, such as when it entered the system, who put it there and who has the ability to access it.
Security and compliance are both essential for NSA, especially following Snowden’s leaks of classified information in 2013. In designing its cloud, the agency made sure to account for both. Not only are the actions of analysts and administrators monitored, they have access only to information they are credentialed to see. Lonny Anderson, NSA’s chief technology officer, summed up the agency’s approach to security this way in a September speech in Washington: “Tag the data, tag the people.”
From a compliance perspective, NSA’s cloud relieves analysts from having to know when surveillance laws are
modified because those updates can be incorporated centrally, much the way security updates are rolled out. This means analysts don’t have to make their own judgments about what they are allowed to see—the data before them is data they are legally allowed to access.
“As we made the decision to go all in [with] cloud in 2010, we made the equally overt decision to build compliance in,” the senior NSA official says. “What we don’t want is for an analyst to have to think, ‘Am I authorized to see this data?’ ”
Converging Clouds
For now, NSA’s private cloud is actually two parallel clouds. The first is an internal cloud accessible to its employees. The second, called GovCloud, is available to the intelligence community through the Joint Worldwide Intelligence Communications System. With GovCloud, NSA works as a cloud service provider to other IC agencies, allowing them to order services such as computing and analysis.
The goal, according to senior NSA officials, is to merge these parallel clouds by the end of 2015, thereby eliminating the burden of maintaining separate systems as well as the agency’s reliance on disparate databases. NSA plans to begin closing legacy repositories within the next few months, migrating information from purpose-built repositories to a single pool as its cloud reaches full operation.
The transition has been far from seamless, with NSA experiencing some outages. Migrating data to the cloud also has proved difficult, particularly the merging of duplicative data sets.
But the agency’s culture may be the biggest hurdle. NSA has organized a series of initiatives designed to acclimate analysts to life in the cloud. One of them, called I-CAFÉ—for Cloud Analytics
Fusion Environment—puts analysts and developers together to brainstorm. Another, Future Architecture Transition (FAT) Tuesdays, brings groups of 150 to 200 analysts together regularly and forces them to work in the cloud and refrain from using legacy repositories.
“We’re learning in public and trying to do this while we support missions all over the globe,” the senior official says. “We’ve had some bumps, but I think we’ve worked through those. Our strength in GovCloud is that we have a common space for all data sets. We’re close to ready for prime time.”
Together, the NSA GovCloud—a combination of open-source software stacked on commodity hardware—and the CIA’s Amazon-built C2S cloud represent a joint storefront for intelligence community data. Over time, the interoperable clouds will share data and handle an increasing variety of requests from IC agencies.
NSA is likely a few years from completing its transition to the cloud, but its $1.5 billion data center in Bluffdale, Utah, illustrates the scope of the effort. It is essentially a 1 million-square-foot repository for signals intelligence designed explicitly to collect digital data.
In 2012, Utah Gov. Gary Herbert declared the Utah Data Center would be the first to house a yottabyte of data—the equivalent of 1 billion petabytes, or more than 1 billion times all digitized content in the Library of Congress. While NSA has not disclosed the actual storage capacity of the Utah Data Center, it has declared on its website that its capacity will be “alottabytes.”
Whatever the capacity really is, it will give NSA analysts around the globe a lot of information to sift through when the cloud transition is complete.