FAQS: Grid computing

Are grids coming to a data center near you?

Grid computing is a way to share computing resources within and among organizations. The concept first emerged in the mid-1990s as academic researchers began exploring the rudiments of grid infrastructures. Around 2000, grids moved beyond the basic research stage as organizations began building grids to support scientific and technical computing applications.

Now, attention has shifted to the wider application of grid infrastructures in both commercial and government settings. Technology watchers believe grids will eventually become a staple of mainstream computing. Although some progress has occurred among enterprise data centers, that vision remains largely unfulfilled. Issues that block further adoption of grids include per-processor licensing requirements and a dearth of grid-enabled applications.

FAQ
What is grid computing, and what are the main variations?

Different definitions and interpretations of grids exist, some of them conflicting. Ian Foster, head of the Distributed Systems Laboratory at Argonne National Laboratory, defines a grid as a system that coordinates resources that are not subject to centralized control and live in different domains — for example, different groups within the same organization. Foster also defines grids as employing a mix of standard and open protocols and interfaces.

In this view, a grid is an open-source system that lets users in one group tap into computers and applications residing in other, possibly far-flung groups. The Global Grid Forum (GGF) aims to standardize this vision of grid computing. It promotes the Open Grid Services Architecture, which defines a set of core capabilities for grid computing. The forum works with the Globus Alliance, which backs development of the Globus Toolkit, an open-source utility for building grids.

Meanwhile, a number of major hardware and software vendors have developed their own variations on the grid theme, including Hewlett-Packard, Oracle and Sun Microsystems, to name a few. IBM has a grid computing initiative, but the company works closely with GGF and the Globus Alliance.

Those companies also describe grids in terms of shared computing resources and support standards such as Web services.

FAQ
Should I choose a vendor-specific or an open-source grid?

Vendors' proprietary grid architectures don't communicate with one another. In other words, a user in one vendor's grid environment can't access resources in another vendor's grid.

Proprietary systems "will federate computing or data for you, but only on the condition that you buy into a particular vendor's approach," Foster said. "There are a lot of concerns about that and pressure on vendors to adopt standards so [users] can plug and play products from different sources."

By comparison, open-source tools such as the Globus Toolkit let organizations apply grid computing "without getting locked into a vendor's approach," Foster said.

But the Enterprise Grid Alliance (EGA) offers a different take on the interoperability issue. Oracle was the initial force behind this group, which was created last year. In May, the alliance published a reference model for enterprise grids. Peter Lee, chief executive officer of DataSynapse, an alliance member, called the reference model a good start toward "defining how a number of different vendors can interoperate."

Lee said he doesn't expect to see a single specification dominate grid computing the way the widely used Java 2 Platform, Enterprise Edition dominates the development of enterprise applications that are portable among diverse computing platforms.

Instead, a number of vendor-specific grids will exist, and eventually, interoperability standards will link them, he added.

Still, the broad objectives of organizations such as GGF, EGA and others are not necessarily at odds, grid builders said.

"I would say there is a strong sentiment in the community — not just academic researchers but industry — for these various groups like EGA, GGF and [the World Wide Web Consortium] to coordinate in some way," said Charlie Catlett, a senior fellow at the Argonne lab and executive director of the TeraGrid initiative funded by the National Science Foundation.

The issue of which grid approach to choose doesn't have to be an either/or proposition, said Mike Bernhardt, CEO of Grid Strategies, a consulting firm focused on grid computing. A commonly shared, open grid will benefit some customers, while others will find proprietary grids suitable for their purposes, he said.

"Generally speaking, the grid standards and, therefore, standards-based implementations are not yet mature enough or pervasive enough to always choose one for a grid project," said Al Bunshaft, IBM's vice president of grid computing sales and business development. "Therefore, we have a maturing and evolving market today with a mixture of types of software and implementations."

For example, a scientific computing center that needs to collaborate with universities and laboratories might gravitate toward the open-source model. Scientists are already using that approach for projects involving researchers worldwide. For example, the Large Hadron Collider (LHC) project was the focus of a recent international grid-testing exercise that spanned the LHC Computing Grid and the Open Science Grid. CERN, the Geneva-based particle physics laboratory, is building the collider, which will study the properties of subatomic particles. It is scheduled to begin operations in 2007.

"To deliver science from the LHC, we need working, interoperable grids," said Ruth Pordes, associate head of the Fermi National Accelerator Laboratory's Computing Division and a member of the Open Science Grid Consortium's board. She said she and other members of the scientific community favor the evolution of a common grid model.

However, an organization seeking to deploy grid computing internally might opt for technology offered through a familiar hardware or software supplier.

FAQ
Are there different kinds of grids for different purposes?

Tim Hoechst, senior vice president of technology in Oracle's Government, Education and Healthcare Division, said different types of computing problems lend themselves to different types of grids. He made the distinction between processor grids and data grids. A processor grid deals with a problem that can be broken into pieces and processed on multiple computers. A data grid handles problems that can't be subdivided and must instead run continuously on multiple computers. He said Oracle concentrates on the latter.

For an example of a processor grid, Hoechst cited the Search for Extraterrestrial Intelligence project, which broke the task of sorting radio signals into chunks and assigned them to thousands of computers. An example of a data grid is a database running across a cluster of computers.

FAQ
What are the benefits of building a grid?

Bernhardt and others said they view the ability to coordinate and share resources as the primary benefits of grid computing. Grids let an organization draw on resource pools within and, in some cases, outside the organization to accomplish a given computing task.

For scientific and technical computing, a grid might provide remote access to expensive and uncommonly deployed equipment. For example, the Network for Earthquake Engineering Simulation grid lets researchers remotely operate equipment and conduct experiments.

Grid computing maximizes other resources, such as processing power, memory and storage, and improves their efficiency.

FAQ
Can mainstream data centers use grid technology?

Grid advocates believe the technology will appeal to data centers seeking to adopt a computing infrastructure that is normally beyond their reach. But the use of grids in general enterprise settings, as opposed to high-performance computing centers, has only begun to grow in the past few months, experts say.

"We're starting to see an increased amount of grid deployment in the enterprise space," said Walter Stewart, global coordinator of grid strategy at Silicon Graphics.

Steve Tuecke, CEO of Univa and co-founder of the Globus Alliance, said the financial services sector leads mainstream data centers in pursuing grids. Univa plans to release commercial versions of Globus software in the second half of this year.

A similar level of activity at government data centers has yet to kick in, grid experts say.

"It's a work in progress," Foster said of enterprise data center adoption, adding that a few government agencies are planning to launch exploratory grid deployments.

Meanwhile, a number of sources pointed to the Environmental Protection Agency's grid project as a sign of wider government adoption of the technology. A test project announced last September employs grid computing to improve air quality modeling. IBM and Computer Sciences Corp. collaborated on the project. One of the software products used in the test was IBM's Grid Toolbox, which is based on the Globus Toolkit.

"EPA was and still is exploring how to use grid to better service their users," Bunshaft said.

EGA aims to further expand grid computing's inroads at data centers. The organization's reference model "helps describe requirements and standards and is designed to help accelerate enterprise grid adoption," Lee said.

"It gives us all a common playing field," Hoechst added.

Still, significant technical and financial barriers block wider acceptance of grid computing. And would-be resource sharers must also navigate layers of technology to create a grid architecture.

But the need for grid computing may create sufficient momentum to overcome the challenges, experts say.

"The grid vision ...of resource federation and resource sharing across enterprises is something a lot of people are eager to see happen," Foster said.

Two sticking points

Grid technology is past the research and development stage, but barriers stand in the way of its wider acceptance in mainstream computing. Two of those barriers are:

A dearth of applications take advantage of grid approaches.

Ian Foster, head of the Distributed Systems Laboratory at Argonne National Laboratory in Argonne, Ill., called the adaptation of applications for grid execution a significant obstacle. As grid-enabled software becomes generally available, “the cost inherent in an enterprise adopting a grid is dramatically reduced,” he said.

Grids reduce costs because organizations don’t need to modify applications to get them to run on a grid. Vendors are working on more grid-ready applications.

“We realize for grid [computing] to have wider adoption and to easily implement grid deployments, the applications have to be ready,” said Al Bunshaft, vice president of grid computing sales and business development at IBM.

To that end, IBM has launched Think Grid workshops, which promote grids to application developers. Bunshaft said IBM has enabled 40 software vendors’ products to run in grid environments.

Old pricing models are problematic.

Traditional per-CPU software licensing is another grid impediment. "In a grid, you don't know in advance where an application is going to run," Bunshaft said. "A customer can't possibly be expected to have a license purchased for every possible machine that an application might run on."

He said enterprise software licenses lend themselves to grid computing. But IBM is working with the Global Grid Forum to develop an overall approach to grid software licensing, he added. "We need an industry mechanism for doing this," he said.

— John Moore

A many-layered grid

The making of a grid involves a number of technology layers, which supports a step-by-step deployment strategy, some industry executives say.

Although different grid approaches exist, most experts agree that virtualization represents the foundation of grid computing. Server virtualization involves carving servers into partitions and recasting physical resources, such as processors, memory and input/output adapters, into logical divisions. Each partition within a server acts as an independent entity that can be assigned its own operating system and application. Resources from the pool may be allocated to the partition as workload changes.

XenSource's products and EMC's VMware unit provide virtualization in the open-source market. A bevy of storage virtualization products perform a similar resource pooling role on the data side.

Peter Lee, chief executive officer of DataSynapse, said he calls the next layer "infrastructure provisioning." This software layer automates configuration management, the "bare-metal" provisioning of operating systems and applications to servers in a grid environment. IBM's Tivoli product unit, Landesk and Opsware are among the vendors operating in this space.

Lee said he labels the layer above infrastructure provisioning as service provisioning, or orchestration. This layer identifies the available resources in the virtualized pool and assigns them to application tasks.

"Once you start to have multiple application tasks looking for resources to run on, you need to make sure resources are allocated to the tasks in some kind of prioritized fashion," said Al Bunshaft, vice president of grid computing sales and business development at IBM. IBM, Hewlett-Packard and Sun Microsystems are among the vendors pursuing this field. The open-source Globus Toolkit also handles resource discovery and allocation.

Application service virtualization represents yet another grid layer. The idea is to make multiple applications easier to execute in a grid environment. DataSynapse's GridServer and Platform Computing's Platform Symphony are examples of this technology.

— John Moore