New storage file systems make large-scale data sharing easier
Storage-area networks (SANs), which enable multiple servers to share communal pools of storage, are proving themselves to be a vital resource in today's world of data-hungry applications. But until recently, SANS have lacked a way for users to share files, something that's crucial if they are to allow the kind of uninhibited information access that government workers increasingly need.
In particular, SANs haven't provided a way for users working with different operating systems to access that shared data at the file level. That became possible several years ago with the debut of network-attached storage (NAS) systems, which can serve files to different platforms but are less ideal when it comes to handling the types of very large files common in government applications.
SANs have provided multiple servers with shared file access of sorts for several years, though only in homogenous, single-operating-system environments. The holy grail of SAN development is heterogeneous file access across multiple operating systems, which will enable users in a mixed platform environment — the norm in many agencies — to have seamless access to the consolidated storage available in a SAN.
This multiple-operating-system problem has been notoriously hard to overcome because the way files and data are stored differs from one operating system to another, and from one file system to another.
But a new generation of SAN technology from the likes of SGI, IBM Corp. and others promises to do just that, providing agencies with an opportunity to build data storage infrastructures that are far more amenable to the type of data sharing and integration initiatives many plan, while significantly lowering their administrative costs.
True Data Sharing
Products currently available, such as Tivoli Systems Inc.'s SANergy file-
sharing software, solve part but not all of the puzzle.
With SANergy, "you can create data on a particular platform, and then access that data from a set of heterogeneous platforms," said Jose Iglesias, director of storage management for Tivoli, an IBM subsidiary. "You can create a file on a [Microsoft Corp.] Windows platform, for example, and have [an Apple Computer Inc.] Macintosh or other application go off and work with that file."
However, he said, this approach doesn't provide true data sharing. That requires a system independent of any operating system that can provide a uniform way to store and access files, as well as support data integrity features such as file locking, in which there is just one file that a number of users share and have access to, and where the data is changed in a coordinated way.
IBM's Storage Tank, which the company may release this year, reportedly supports multiple operating systems and features such as file locking. The brains of Storage Tank will be a cluster of servers running Linux that manage the SAN's metadata — a sort of index system for the files on the various SAN storage devices.
The Linux cluster will communicate with Storage Tank clients on application servers running various operating systems, and those servers will be able to exchange data freely with one another, with the Storage Tank software taking care of file locking.
Meanwhile, SGI introduced a new version of its CXFS software, which is how the company refers to its clustered shared file system, at the beginning of June that enables heterogeneous file sharing across a SAN, according to Robert Murphy, SGI's marketing manager for storage. In this case, the file system can simultaneously distribute multiple read-only copies of a file to different authorized client systems. If one of the client systems wants to make a change to the file, CXFS will release the only "write" version of the file to that system and make sure that no read-only versions are open at the same time. This prevents any client system from inadvertently working with an outdated version of the file.
The advances in file access systems for SANs should help make networked storage even more attractive to users, said Steve Whitner, marketing manager for storage vendor Advanced Information Digital Corp. (ADIC).
"SANs are very well attuned [to] handling large blocks of data and for providing the high bandwidth needed for moving that data around a network," he said, "but they've been historically bad for sharing data, since you put data onto a SAN in such a way that it makes it look to the user just like the old kind of [directly attached storage that] supplies data in a single format over the network."
ADIC's answer is a product called CentraVision, which enables servers running different operating systems to use a distributed file system to share data.
Whether this will immediately make SANs attractive to more customers and boost adoption, which most observers say is still in the early phase, is not clear.
Gartner Inc.'s Dataquest Inc. sees a fairly strong growth in the overall market. Sales of Worldwide Fibre Channel, the main SAN protocol, rose 13 percent in 2001 — a generally difficult year for the industry — to $1.46 billion, according to the firm, which predicts that double-digit growth will continue through 2006, when revenues will approach $6.7 billion.
Some believe SAN file systems could be an important piece of the puzzle. If data sharing and direct file access are shown to work well, "then I think it would take [the use of SANs] to a whole new level," said Ben Kobler, a computer scientist at NASA's Goddard Space Flight Center who is involved with the data-intensive Goddard Earth Observing System. "Traditionally, we've only been able to control hardware components over the SAN, but now we can actually begin to share data."
NASA Goddard has only local, homogenous SANs now, he said, which are not used for sharing data across the wider network. However, there will be a need to share data at some point, and the agency has been evaluating products such as Tivoli's SANergy and ADIC's CentraVision.
"As more and more true file sharing becomes available, then the demand will increase, but that will also require a much closer management of the data so we don't accidentally make changes to the ways the [storage] is structured," Kobler said.
A largely unknown and important factor, he said, is how well the new data and file-sharing systems will scale as storage volume grows.
At least initially, most of the demand for file-sharing SANs is expected to come from large-volume data users — such as NASA Goddard — that are used to handling terabytes of data at a time and have a need for frequent review of datasets by multiple users.
"On the scientific side of our organization, where there's a need to take data and feed it back, where there's more collaboration involved, they do have a need for shared systems and they are testing products now," said Mark Gutscher, project leader for Unix system administration at Sandia National Laboratories. "On the business side, we have set up a Windows NT file server with a SAN on the back end that manages lots of little files."
Although he sees an expanding demand for SANs across the Sandia organization because of their flexibility and for such features as mirroring (a method for creating a redundant system), he doubts if there will ever be a similarly broad call for file-sharing systems, because the business side probably won't need the capability.
Tivoli's Iglesias also thinks the first demand for the newer file-sharing systems will come from scientific and engineering organizations that are used to managing petabytes of storage. However, he said, most users may not yet need those capabilities.
But the notion of true data sharing is evolving quickly, he thinks, and is beginning to move from test centers into production environments, and people are looking to build data centers around it.
"Data sharing SANs are becoming mainstream," he said.
Robinson is a freelance journalist based in Portland, Ore. He can be reached at hullite@mindspring.com.
NEXT STORY: South Dakota puts most state forms on Web