Looking into disk mirroring

Feds weigh options for data protection

It's a classic double whammy. Not only do information technology managers oversee more data than ever before, they also face more threats to this data than ever before. Fires and natural disasters such as floods have long been data center worries. Now add to this list post-Sept. 11 terrorism concerns and the specter of both physical and electronic attacks.

The situation has compelled government agencies to revisit their data protection plans. The time-tested practice of tape-based backup is among the items under evaluation. For many organizations, tape is the main insurance policy against data loss, but restoring data from tape can be time-consuming. Tapes must be retrieved from the backup facility and the data then loaded on disk arrays at the production site, a process that can take days.

This lengthy process has prompted some data center managers to examine an alternative method: disk mirroring. With disk mirroring, an organization continuously replicates selected data from one disk array to another. Replication can be local, within a data center, or remote, to a distant data center.

The key advantage of disk mirroring is speed. Recovery may take just a few minutes, particularly if the system is set to automatically connect to the backup system. Timeliness of data is another plus. Disk mirroring provides a more up-to-date copy of data than tape backup does. Data stored on tape is only as current as the latest backup session, and sessions are typically scheduled 12 or more hours apart.

But disk mirroring, which has been around for years, is no miracle data-protection cure. One drawback is the cost. The price per megabyte of storage is still cheaper for tape than disk, although the gap is narrowing. Disk mirroring may also involve higher network infrastructure costs, because high-bandwidth links may be required between backup and production sites.

Organizations considering disk mirroring must also weigh the different deployment approaches, each of which has its pros and cons.

Still, the attraction of increased data availability has prompted many commercial entities in areas such as financial services to adopt disk mirroring. More federal agencies are also beginning to investigate the approach.

"We're seeing a lot of interest," said Robert Manchise, chief technology officer at Anteon Corp., a Fairfax, Va., government integrator. Government agencies "are following the commercial marketplace."

For example, the Federal Emergency Management Agency knows "the importance of having synchronized data, [in] real time and up to the minute," he said.

Mirroring Issues

Cost tends to be the first issue organizations confront when considering mirroring. Managers who want to mirror to the same brand and model of storage subsystem used in the production facility will pay a price for this duplication. The cost, however, can be reduced if an organization opts for heterogeneous mirroring using less expensive disk arrays at the backup site (see "SANs take on mirroring," Page 34).

Network investment is another factor. Today, mirroring generally requires a separate network infrastructure, distinct from IP traffic. Typically, this means employing Fibre Channel technology. An organization could use Dense Wavelength Division Multiplexing — a technology that transmits light signals simultaneously via a single optical fiber — to merge data mirroring and voice traffic over IP, but experts say this is an expensive choice.

IP-based storage solutions are beginning to surface, however. In addition, the emerging SCSI over IP, or iSCSI, promises to enable organizations to run storage traffic on existing IP networks (see "iSCSI may cut mirroring costs," Page 33).

Given the expense, technology managers tend to use disk mirroring only for mission-critical datasets.

Mirroring "gets expensive if you have a lot of data," said Bill Swartz, manager of infrastructure computing services at Sandia National Laboratories in Albuquerque, N.M. Accordingly, mirroring is done sparingly in high-performance computing environments, he said.

Michelle Butler, technical program manager for the storage-enabling technologies group at the National Center for Supercomputing Applications, said the center is not mirroring all of its data.

"We are mirroring critical system- dependent file systems," she said. "We are not opposed to tape and still use it. But for disaster recovery, disk is much faster."

Paths to Mirroring

Once a decision is made on which data to mirror, the next step is to determine what mirroring approach to take. The two main paths are synchronous and asynchronous.

In synchronous mirroring, data isn't committed to storage in the production facility until a copy of that data is delivered to the target storage subsystem, which is the mirrored site. When the data is successfully mirrored, the target subsystem acknowledges the receipt to the production facility, which then completes the storage transaction.

This way, if the input/output request fails on the target side, it is not recorded on the production side. Both sides of the storage equation are kept in balance, up to the last completed transaction.

The main drawback of synchronous mirroring is latency. Data has to travel from primary to target storage and back again. The delay may cause applications to time out. Latency is not an issue if the primary and target storage sites are nearby. But to guard against site failure, an organization may locate a backup site miles from its production facility. Storage experts say that if the distance is more than 62 miles, latency will affect application performance.

Asynchronous mirroring, however, may be the solution for mirroring data over great distances. In this approach, the local storage device does not wait for acknowledgment from the remote system before storing data, thus overcoming the latency problem.

But there is a downside. Data inconsistency may occur in the event of a failure, because the asynchronous approach does not wait for the remote storage device to confirm the receipt of data.

"A small number of input/outputs can be lost," said John Selep, product marketing manager for storage solutions at Hewlett-Packard Co. But asynchronous mirroring, he continued, "allows those willing to lose a small number of input/outputs to do continental-level mirroring."

Jay Desai, manager of data protection solutions at Network Appliance Inc., which produces SnapMirror software, said asynchronous mirroring works well except for those applications requiring "100 percent data concurrency."

"Not everyone needs synchronous," mirroring, said Bob Guilbert, vice president of marketing and business development at NSI Software, which markets DoubleTake, a mirroring product.

Still, federal customers and integrators say that the synchronous mirroring approach is the way to go for an agency's most important data. "We see synchronous being more important...for mission-critical" data, said Anteon's Manchise.

"Asynchronous mirroring isn't real-time enough for most enterprise applications and loses transactions during failover," said Dave Puzycki, a senior research scientist with Pacific Northwest National Laboratory's IT infrastructure group. The lab is deploying storage-area network (SAN) architecture and looking into disk mirroring. But it is doing so with deliberation.

"One disadvantage of using disk mirroring is that database corruption can also be mirrored, reducing the possibility of error-free database recovery," Puzycki said.

To compensate, the lab is exploring SAN software technology that can take snapshots, or copies, of a database. "Snapshots can be instantly created and take [up] little space, therefore making it much easier to fall back to a known good state," Puzycki said.

The intricacies of disk mirroring call for cautious steps. But for agencies with sensitive data to protect, mirroring may be worth looking into.

Moore is a freelance writer based in Chantilly, Va.