How to make the most of your COOP investment

Managers are finding new ways to protect their computing resources in an emergency without paying for idle capacity that doesn’t contribute to daily operations.

Sitting in the middle of a war zone, Fallujah, Iraq, can quickly become a place where continuity-of-operations (COOP) plans move from theory to implementation. On top of all the everyday problems that can plague civilian information technology managers, the technology infrastructure that supports Marines at Camp Fallujah must deal with rocket-fire assaults and life-threatening hazards.

Five everyday uses for emergency resources

1. Use the continuity-of-operations network and data replication infrastructure to handle routine data migrations between primary sites.
2. Shift workloads temporarily from the primary to the backup site to perform routine system and software upgrades on the primary site while not interrupting user operations.
3. Tap computing resources at the backup site to absorb spikes in workload because of seasonal or other special conditions.
4. Use the backup site's idle capacity as a development platform.
5. Swap workers' desktop computers with laptop PCs that can support routine telework arrangements or remote access to agency servers from other locations during an emergency.

So it’s somewhat ironic that IT officials in Fallujah used their COOP not for a major emergency but to stage an orderly migration of data storehouses when commanders decided last spring to move the camp to a new location.

Officers quickly dismissed the option of physically transporting 17T of mission data, e-mail records and standard operating procedures from the camp to the new facility at Al Asad Air Base, about 90 miles away.

Improvised explosive devices, “foul weather and the potential for accidents in transit made the risk of physically moving the [file servers] by [helicopter] or convoy unacceptable,” said Capt. Criston Cox, senior data systems and IT officer for the Marines in Iraq in 2008.

Instead, with help from Marines in the 9th Communication Battalion, Cox devised a data-migration plan that piggybacked on the new COOP he had implemented when he arrived at the camp in early 2008.

“In the end, our COOP also served as a viable data-migration strategy, leaving the original data stores unscathed until the new site was 100 percent operational,” he said.

Not all continuity plans are tested in combat conditions, but especially during times of shrinking IT budgets, technology managers are looking for ways to protect their resources in an emergency without paying for idle capacity that doesn’t contribute to daily operations. IT managers and consultants say that with the right IT architectures in place, organizations can safely see their COOP resources perform multiple roles.

A number of options

Although concrete numbers aren’t available, the double-duty COOP is a growing trend, said Bill Malik, a research director at Gartner.

One of the most common ways of achieving dual goals is to modify traditional COOP data-center configurations. The time-tested approach is to support an active production facility with a backup failover site dedicated to COOP and activated only during an emergency. Now organizations can vary that theme by distributing everyday workloads between the two sites.

“If you have two sites that are fairly close in terms of their configurations, you might use that as a strategy to avoid scheduled outages,” Malik said.

For example, to upgrade an operating system, the IT staff might migrate all the production operations to the ancillary site and apply the upgrade to the main facility. After technicians bring that data center back online, they reverse the process. Both centers are updated, and users don’t experience downtime.

The approach can also be effective for balancing processing loads when demand spikes occur, said Bill Peldzus, vice president of data center and disaster recovery services at GlassHouse Technologies, an IT consulting firm. In another scenario, one facility runs the main production environment while the other acts as the primary development and testing resource.

“If there are high-speed networking links between those two sites, you could have staff at either site working on the same project when necessary,” Peldzus said. And if an emergency strikes, the ancillary facility stands ready to take over production activities.

But the traditional two-facility architecture isn’t the only option. An alternative is a hub-and-spoke arrangement consisting of a central site packed with enough computing power to become the COOP failover site for a collection of smaller satellite facilities. That design can help organizations better capitalize on computing power across all facilities, said Will Arias, Navy Marine Corps Intranet systems engineering manager at EDS.

“It’s a good alternative to a paired model, where you have to build out the entire infrastructure at each pair, and you may be leaving hardware sitting idle,” he said.

One drawback is the potential problems that can result from a widespread outage. “If you are trying to do double duty, [the hub] would be able to fail over for one site, but you may not be able to fail over two or more sites,” Arias said.

Risk avoidance

Cox’s COOP in Iraq used a model in which several primary operational sites had their own dedicated backup sites and could support one another, a feature that was essential to the data migration plan. The primary sites were in Fallujah, Al Asad and Taqaddum Air Base. Each was backed by its own mirroring site a mile or two away, which kept identical, up-to-date copies of data sent from the main facility and was prepared to take over if an emergency occurred.

The cost for all the COOP technologies was about $4.2 million, Cox said.

Once his team received the order to move, it had less than six months to close down the Fallujah IT operations and re-establish them at Al Asad. When it became clear that physically transporting the data was too risky, Cox decided to move the data via the network that connected Camp Fallujah with the Al Asad Air Base.

As part of the COOP infrastructure, Cox's team had installed accelerator devices on the network to increase throughput speeds for the data mirroring and backup operations. The accelerators became an important piece of the data migration project and helped increase transmission rates between the two bases from 2 megabits/sec to 6 megabits/sec.

When moving day came last November, some Marines in the 9th Communication Battalion worked at Camp Fallujah while others flew ahead to Al Asad to start the mirroring operations. Cox acted as the project manager orchestrating the move.

“It was a massive air movement to get several thousand people [to Al Asad] throughout the day,” Cox said. “They were basically told that on that day they’d be flying and all e-mails had to stop because I had to let the mirrors catch up to Al Asad.”

By the next morning, the commanding general had resumed control at his post in Al Asad, and all migrated systems were up and running with only a 24-hour break in e-mail communications.

New efficiencies

The National Labor Relations Board is also finding ways to make COOP pay beyond the safety net it provides for emergencies. Two core elements of its plan — a mobile workforce and an outside host for its central data center — reduce the risk of downtime while increasing the day-to-day productivity of employees and the ability of IT managers to quickly adjust resources to meet new demands.

Almost 70 percent of NLRB employees now use laptop PCs that attach to the office network via docking stations rather than using desktop PCs, and that figure could grow to 85 percent by the end of the year. The agency combines computer mobility with an off-site data center managed by service provider Savvis Federal Systems to keep the agency running if its headquarters facility is shut down because of a man-made or natural disaster.

“People still can get to servers as long as they can find an Internet connection,” said Richard Westfield, NLRB’s chief information officer.

He sees other benefits, too. “It also makes sense from a productivity point of view,” he said. “Since everything is centralized and not stored on local file servers, you can shift work around to other regions.”

For example, NLRB operations in New Orleans were knocked out after Hurricane Katrina struck. If the current centralized services had been functional then, NLRB offices in other areas could have accessed the New Orleans data and potentially kept some of those operations running at an alternative location, Westfield said.

Under NLRB’s contract with Savvis, the company provides centralized hosting services, processing, storage and networking resources to the agency provisions as needed. That leasing arrangement allowed NLRB to sign a blanket purchase agreement with the provider that establishes the prices of anticipated future IT needs.

“If we need scalability in terms of more CPU power, additional storage or network connectivity, they can provision it in a matter of days,” Westfield said. Acquiring the added resources through the traditional procurement process could take months, he added.

That flexibility proved valuable when NLRB needed to bolster the performance of its main public-facing Web site, which was seeing steady increases in traffic. “It was getting many hits, so several months ago, we decided to get more processing power for it,” Westfield said. “We have the ability to function as a real-time enterprise because we already have the contractual arrangements in place.”