Agencies grapple with complexities of disaster recovery efforts
Disaster recovery plans once were simple: Back up the mainframes' data overnight onto magnetic tapes and toss the tapes into a closet. That might have sufficed 20 years ago but not today.Although disaster recovery has been around for decades the job has changed over the last few years. What is cons
Disaster recovery plans once were simple: Back up the mainframes' data overnight onto magnetic tapes and toss the tapes into a closet. That might have sufficed 20 years ago but not today.Although disaster recovery has been around for decades the job has changed over the last few years. What is considered a disaster has changed as have the environments that agencies need to protect.In recent years the decision about how to deal with disasters has become tougher as the data has become more important agencies and vendors said.
For example in a typical federal environment the targeted recovery time is 24 to 48 hours said Tom Mazich vice president for Comdisco Inc.'s government disaster recovery operation in Arlington Va. "But the window is shrinking " he said. "More projects are falling into the shorter window. As you leverage your technology investment you become more dependent on it."
In response disaster recovery specialists such as Comdisco and SunGard Recovery Services Inc. and systems vendors have responded with a new generation of recovery solutions such as electronic vaulting in which critical data is sent electronically to an off-site location rather than loaded onto reels of tape.
However distributed computing - in which agencies are storing critical data outside the data center in the workgroup environment - remains a puzzle that agencies have yet to figure out.
Defining a Disaster
Over the years more and more events are being labeled "disasters" by information technology professionals. International Data Corp. divides disasters into three categories: natural man-made and political.
Natural disasters are those most commonly associated with disaster recovery - floods storms forest fires and so on according to Lisa Ross senior analyst for IDC. "Man-made disasters are building fires power failures hacking or human error " she said. "Political disasters are riots strikes or bomb threats."
Of these disasters the most common are man-made Ross said such as hardware and software malfunctions power failures and human error although the natural disasters get the most press. Ultimately the value of the data determines how far agencies are willing to go to protect it agencies and users said.
For example some agencies can run "shadow" or mirrored operations - essentially running duplicate or mirrored processing systems in a separate location. But a mirrored system is cost-effective only for the most mission-critical systems agencies said. Otherwise something less than complete replication but more than hard-copy printouts is probably appropriate.
"Everything is seen from a cost perspective " said David Krohmal manager of the General Services Administration's Federal Systems Integration and Management Center's (Fedsim) Disaster Recovery Services Program which working with Comdisco provides services to agencies throughout government.
"At one extreme you have the shadow operation ensuring that you could be immediately up and running but there's a cost associated and there are very few functions where that's justified " Krohmal saidHowever agencies also have a multitude of less costly solutions from which to choose because vendors are expanding the types of services they offer.
"The field has expanded to include providing Internet security virus security and emergency response service " said Mike Solter project manager of networking development for IBM Corp.'s Business Recovery Services. "We're also doing things that provide more value to the customer " he said. Those added services include consulting and educational services at the inauguration of a recovery scheme.
Data Recovery
But data recovery remains one of the key elements of any disaster recovery plan. It's still being done primarily by backing data onto a tape and storing it in a safe facility off site. Magnetic tape remains the medium of choice despite numerous obituaries published for tape over the last few decades.
Traditionally backing up data and storing it in a safe location meant running the tape manually taking the tape off the machine by hand and physically moving the tape to another facility.
But "there are technologies that let you do that in an electronic and automated fashion " Mazich said. "You can take the human out of it."
Comdisco's data recovery services can provide capabilities on several different levels. Backup can be automated on a local basis through network administrators although data still has to be moved off site. That too can be accomplished over telecommunications links - typically T-1 or T-3 - where data is captured by a host similar to the original system.
Mazich described it as an integrated solution that includes network hardware and software components. Bandwidth is expensive and the solutions are still used primarily for critical data rather than for a whole system. The company uses a number of third-party software products while providing its own backup and automation infrastructure for the solutions.
Like Comdisco SunGard customers usually back up to tape manually and physically transport tapes to an off-site facility. This remains the primary means of data recovery "but the emerging paradigm is being able to protect data at two places at the same time without going through those manual methods " said Jim Lindeman director of storage solutions for SunGard.
SunGard now offers customers the opportunity to "shadow" critical databases directly to a SunGard facility off site. Such capabilities have been available for 10 years or more but they are becoming more popular because data is becoming more precious and because the technology is maturing the company said.
"When it was a software solution it was cumbersome and application-specific " Lindeman said. The solution takes advantage of the disk-mirroring capabilities inherent in many systems. "It is now a hardware solution where the storage controller itself takes responsibility of capturing the I/O and moving it to another location " he said.
Bandwidth and cost concerns still tend to limit the application of shadowing to mission-critical applications. "The technology doesn't take the place of tapes " Lindeman said. "It facilitates a different rate of recovery."SunGard offers its system-shadowing capabilities as part of its SunVault services. Other services offered include electronic and online tape vaulting.
There are a host of capabilities that "keep an environment functioning " said Mike Braham director of Bell Atlantic's new CommGuard disaster recovery services business. These include the entire communications infrastructure from data to voice to fax and e-mail.
"Sit at your desk and take a 360-degree turn " he said. "Everything you see when you make that turn is covered by continuity planning."
Distributed Disasters
Arguably many of the changes being wrought in disaster recovery can be laid at the doorstep of the new emphasis on distributed computing.
"Seven or eight years ago distributed computing was small " said Tom Sobocinski federal account executive for SunGard Herndon Va. "Today in excess of 50 percent is midrange and LAN environment [in the commercial sector]. Even though the federal government is still heavily mainframe-oriented they still have the Sun systems the HP machines and the PCs in a multi-tiered environment " Sobocinski said.
That means observed IDC's Ross that "mission-critical information may now reside throughout an organization" rather than simply in a data center. And that makes the idea of data recovery that much more complex.
"That complicates recovery " Comdisco's Mazich said. And as is the case for most software tools the disaster recovery software tools available for the distributed environment just aren't as sophisticated as those for the mainframe environment.
On the other hand distributed computing does have its advantages. "It's heightened the users' awareness " said Jim Seligman director of the Information Resources Management Office for the Centers for Disease Control and Prevention in Atlanta. Traditionally backup and data recovery was done in the back room with little or no awareness on users' parts that anything had happened Seligman said.
Also "by its very nature distributed computing means less vulnerability there is no single point of failure that will wipe out your capabilities " he said.
But that same distribution puts considerably more onus for disaster recovery in the hands of individual users and network administrators. Rather than a centralized backup and recovery scheme individuals become responsible for their own systems.
"We have 7 000 users " Seligman said. "It's virtually impossible to get everyone up to the same level of awareness knowledge and responsibility. It's a challenge just to get them to do things like routine data backup and off-site storage."
Fedsim and contractor Comdisco have focused on the mainframe environment although Fedsim has offered some support for PCs workstations and local-area networks. Fedsim has watch-ed the disaster recovery environment change dramatically over the last four years. "When we awarded the contract four years ago there were a lot of mainframe operations that had no recovery capability whatsoever " Krohmal said. "Today that's flipped you'd have to look long and hard to find a data center without a plan."
On the other hand a complete disaster recovery plan is still rare among agencies. "It's a minority that has some kind of arrangement in place " he said. "They're doing their backups and their off-site storage and they may even have some idea where to go to purchase a bunch of PCs on short notice. But few have all the specifics down pat and are doing regular testing."
Army Tackles Full-Scale Backup
The Army's Aberdeen Proving Ground is an example of an installation that is grappling with the full scope of disaster recovery. "We have the traditional six-month backup procedure " said Charles Nietubicz director of the Army Research Lab a Major Shared Resource Center at Aberdeen Proving Ground. "We can pretty much recover research data within that time frame " Nietubicz said.
But the installation is still thrashing out its complete recovery plan. For one thing the backup tapes while stored in a separate building from the data center are still on site. A massive disaster might thus render the current plan moot. The evolving disaster recovery scheme will probably involve a number of neighboring Army data centers sharing the same off-site data facility.
Despite the plethora of technology solutions disaster recovery planning will continue to be a challenge for agencies. As CDC's Seligman said "We have really good planning when it comes to emergency response to deal with a health crisis or [disease] outbreak [but] our own response plans [to IT disasters] aren't quite as good."
-- Lazar is a free-lance writer based in Tenafly N.J.
* * *
AT A GLANCE
Status: In response to a need for better disaster recovery strategies industry vendors have developed sophisticated technical solutions.
Issues: Most solutions are tailored for the data center but agencies now put more data in distributed computing environments.
Outlook: Unclear. Technology continues to evolve but agencies still appear unsure about how to deal with the increasing complexities of their disaster recovery requirements.