The Cloud is Vital for Sharing Census Data
National Archives and Records Administration Chief Information Officer Michael Wash has run some numbers since NARA put the 1940 census online April 2.
NARA outsourced hosting for the site to a cloud-based system managed by Archives.com. The genealogy site won a solicitation to host the platform on a no-cost contract, essentially for prestige and free branding.
If NARA had hosted the census schedules itself in a traditional data center it would have cost between $4 million and $5 million, Wash said. If the agency had hosted the census in its own cloud, it would have cost about $250 million for the first month when demand was at its peak and about $50,000 every month thereafter, he said.
The lesson: cloud-based solutions, which are already a major part of the Archives’ plans for housing new data, must also be an important component of how the agency shares data with the public, Wash said.
He was speaking at a “Cloud Computing Brainstorming Session” Wednesday hosted by the federal information technology group MeriTalk.
Computer clouds allow users to pay for storing and transferring computer data like a utility, only paying for what they use. That makes it an especially attractive option for the Archives, which tends to see massive surges in views and downloads when it first releases new materials and a sharp dropoff thereafter, Wash said.
The 1940 census, the first to be publicly released since the wide adoption of cloud computing, saw 200 million individual downloads in its first month, Wash said. That’s 200 terabytes worth of download data, he said, or about 400 years of continuous MP3 music.
After crashing during its first few hours, the Archives.com platform was able to manage the immense traffic.
When NARA released President Nixon’s long-sealed Watergate trial testimony, the first round of downloads amounted to 5,000 terabytes, Wash said.
The raw .tiff image files of the 1940 census schedules and maps was about 120 terabytes of information, Wash said. The archives rendered those images down to about 4 million JPEG files, between 15 and 16 terabytes.
That was a significant reduction but still too much to transfer to Archives.com via the Internet.
“It would have taken too long,” Wash said. “We had to put it on a transfer device and ship it across the country.”