Keeping data flowing

In the quest to build leaner, more nimble agency systems, attention turns to data.

Art and obscenity rank among those things that defy easy definition: they have a know-it-when-I-see-it quality that makes them hard to pigeonhole.

Data may be similarly enigmatic, but government managers are nevertheless attempting to define it. Some agencies have embarked on data architecture projects, which aim to organize data so that it can be more readily shared within and among organizations. Until recently, such sharing has not been a priority for federal agencies that have accumulated data in isolation. The government now faces a daunting data organization effort. A key assignment: Develop common definitions for the types of data agencies possess.

The Environmental Protection Agency and the Interior Department are among the federal entities taking the latest crack at data architecture. More may follow, particularly in light of the Office of Management and Budget's most recent addition to the federal enterprise architecture.

Last October, OMB unveiled the fifth and final element of this architecture, a data reference model intended to promote governmentwide information sharing. The model promises to provide the guidance agencies need to get moving on data architecture.

"I think what [the data reference model] will do is serve as a tool for all of the agencies who are creating data architectures and data models," said Kimberly Nelson, the EPA's chief information officer.

As for data architecture's benefits, easier data sharing generally gets top billing. But the ability to boost government programs' performance may be its keenest edge, according to government and industry executives. Data architectures improve data quality and eliminate costly redundancies, making programs more effective. That advantage is critical for selling top managers on data architecture efforts.

Once an agency gets the green light, the data architecture job shifts to execution. In that regard, guidelines for running a data architecture effort have begun to emerge. The data reference model provides assistance, but data architects also identified lessons culled from project work. For example, experts say including business and technology managers on projects should be a cardinal rule.

Fred Collins, senior enterprise architect in IBM's Global Government Services Division, said attempts to place data architecture under a technology-only umbrella are mistaken. "I think that is a recipe for failure," he said, citing the need to reach out to the business community and its leaders. Collins has worked with Interior on its data architecture.

Transcending the stovepipes

Over the years, government information systems typically have been built, maintained and defended as stand-alone entities, which is likely why officials often describe systems as stovepipes, islands of automation and silos.

"Up to now, most people have been protective of data silos and competing with each other for who has the most complete data silo," said Brand Niemann, co-chairman of the CIO Council's Semantic Interoperability Community of Practice.

"Every business vertical established its own vertical-specific application with a specific data model," Collins added.

Consequently, application developers labeled data elements — the fundamental units of data — their own way. A data element for an employee's last name, for example, could be labeled in different systems as last name, name or some other variation. Data elements usually were defined for internal use only, said Michael Holt, director of software engineering at integrator STG.

That's not to say that government agencies have never attempted to share data. Patrick McNabb, director of STG's enterprise architecture practice, said agencies have been wrestling with data issues and standardization for at least two decades. He cited the Defense Department's pursuit of data standardization.

"The object is to make sure the information that everyone is talking about is the same and defined correctly and shared," he said. "That goal has been around for a while."

That goal has seen increased visibility in recent years. Among the prime movers: the demand for greater data sharing and reduced data operations costs following the Sept. 11, 2001, terrorist attacks. But as the need to share data models from application to application intensified, agency officials realized they had no common taxonomy to make sharing possible, Collins said.

In the government, some initial attempts to use a common data classification scheme involved the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 11179 standard. The standard offers guidance for naming data elements and provides for a registry of data elements and associated metadata, high-level data used to describe the data elements.

Registries provide information on the types of data an organization has, where it is housed, its source and the format in which it is available. Registries act as catalogs for information stores.

ISO/IEC 11179 proponents argue that consistently documented data is easier to locate, retrieve and share. The standard's charter states, "Precise and unambiguous data element definitions are one of the most critical aspects of ensuring data sharability."

The EPA's Environmental Data Registry, the Federal Aviation Administration's Data Registry and the Australian Institute of Health and Welfare's Knowledgebase are examples of metadata registries built according to ISO/IEC 11179.

Enter the data reference model

Against this backdrop comes the data reference model. According to OMB, "The [data reference model's] primary purpose is to promote the common identification, use and appropriate sharing of data/information across the federal government."

The model builds on ISO/IEC 11179, adapting its approach for describing the structure of data. The data object is the basic element, which is further described by a data property and a data representation. For example, the model document states that "vaccine" would be the data object; the name, weight or potency of the vaccine would be the property; and text, integers or whole numbers would be the representation.

The model's emphasis on data structure will spark much learning among federal officials, Nelson said. "As we model [data structures] in our respective agencies, we will have a better understanding of what opportunities exist for sharing that information," she said.

The data structuring exercise lets agency officials see what data they are collecting and determine what groups inside and outside the agency might be interested in it, Nelson added.

In addition to data structure, the data reference model also focuses on categorization and the exchange format, Nelson said. Categorization places data in a business context and addresses how an agency uses data to support a particular line of business. The exchange format, she said, covers how pieces of information are grouped and shared.

According to some observers, the data reference model is the federal enterprise architecture's central model — and the hardest to make happen.

"I still think of data as the basic object for which all these other models are arrayed," said Dan Twomey, recently appointed chairman of the Industry Advisory Council's Enterprise Architecture Shared Interest Group. "It's the coin of the realm. Applications only work if they've got data to work on."

Data architecture, modeling and analysis lie "at the heart of the enterprise architecture," said Michael Tiemann, manager of the enterprise architecture practice at AT&T Government Solutions. Architecture can't be effectively implemented without a significant effort put into data architecture, he added.

To that end, the data reference model provides a structure through which agencies can put their data houses in order. In theory, the model will impose a consistent, governmentwide method for organizing data. How it will work in practice is another story, some executives say.

"It remains to be seen how valuable the reference model is," said Michael Beckley, co-founder and executive vice president of product strategy at Appian. "The model doesn't solve the basic, hard challenge to get people to agree on specific [Extensible Markup Language] schemas and where to register them. [The model] is a design pattern, but people still need to build in that pattern."

Other observers say the data reference model remains conceptual and difficult for agency managers to understand.

Greater definition could be on the way, however. OMB has tapped Michael Daconta, metadata program manager at the Homeland Security Department, to advance the model.

Daconta heads a working group that will revise and complete the data reference model's five volumes. Last month, the group agreed on a strategy for producing the volumes. The first provides an overview and introduction to the data reference model, while the second will offer a management strategy. The other three will correspond to the model's data description, exchange and context layers.

The management strategy volume will include a section on governance, Daconta said. Industry executives say governance is an important issue, noting that incentives are needed to foster interagency data sharing and collaboration. In the past, OMB has linked funding to the achievement of various technology directives.

"OMB is still reviewing the governance process," said an agency official who requested anonymity because of OMB policies. "It is the goal of the CIO Council and [OMB's] E-Government and Information Technology Office to ensure proper and efficient uptake by the agencies. No particular types of incentives have been decided at this point."

Daconta, however, said the need to improve information sharing will encourage agencies to adopt the data model.

Getting started

In the absence of a financial stick, agencies can still find motivation to pursue data architecture. Benefits include the often-cited improvement in data sharing. In addition, agencies eyeing service-oriented architectures will find the migration easier if they first obtain a solid understanding of their data, government and industry executives say. An architecture's data discovery aspect helps an organization determine what types of services — reusable software components — can be developed and made available for others.

But to get managers' attention, data architecture's potential to improve mission effectiveness may be the prime selling point.

"I've had the best success building the case [for data architecture] on the impact to the organization," said Michael Brackett, a former Washington state IT executive and now a data architecture consultant. "Selling it as something nice to do is not going to fly."

Moore is a freelance writer based in Syracuse, N.Y.

Data architecture's future

The International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 11179 standard is one of the fixtures in the data architecture space and already has been a critical piece of a few government projects. It is also part of the federal government’s data reference model.

Despite the standard’s success, developers continue to search for ways to improve it. Indeed, some data architects are now seeking to push beyond the standard. ISO/IEC 11179 has limits because it primarily handles structured data and requires some human intervention, not computers alone, to achieve interoperability among systems.

"That has been a good start," said Brand Niemann, co-chairman of the Semantic Interoperability Community of Practice (SICOP) within the CIO Council's Best Practices Committee. "But the new paradigm is to take that up a level and to make

[a data architecture approach] apply to both structured and unstructured data and make it machine-processable."

And that new paradigm is semantic computing. This approach seeks to overcome cultural obstacles created by different vocabularies for different IT systems. For example, one system might refer to "price," while another system uses "cost," according to a SICOP white paper on semantic technologies.

To reconcile such differences, semantic computing seeks to rationalize divergent information sets through software. The idea is to structure data formally to eliminate ambiguity and allow computers to make automated inferences when performing tasks such as data retrieval. The approach thus avoids the use of "point-to-point data and terminology mappings, processes that are both time- and personnel-intensive," the white paper states.

One advantage of semantic computing is that computers would be able to interact with Web services software components without human intervention. Michael Beckley, co-founder and vice president of product strategy at Appian, said semantic computing is mostly wishful thinking at this point. But he added that the concept foreshadows computer systems that "will be far more pervasive users of the Internet than we are."

Semantic computing may also shape the future of the data reference model. Members of the federal DRM Working Group have asked SICOP to create a semantic computing profile for the model. Michael Daconta, metadata program manager at the Homeland Security Department and leader of the working group, said semantic computing is not ready to roll out at an operational level, adding that SICOP is the perfect group to investigate the technology road map.

— John Moore

How big a bite?

Management approval sets the stage for the data architecture. At this phase, agencies face a dilemma: Can they reconcile the push for a governmentwide approach with the need to keep projects manageable?

To make a go of it, Kimberly Nelson, the Environmental Protection Agency's chief information officer, said agencies must make a broad commitment to enterprise architecture as the right thing to do. But the initial data architecture initiative need not be an all-encompassing, agencywide affair. Agencies that have established a general architectural framework can "drill down where [they] have programmatic priorities," she said.

But although smaller may be better, architects are counseled against building a data architecture for every information technology project.

Fred Collins, senior enterprise architect in IBM's Global Government Services Division, recommended that agencies tackle data architecture by line of business or an aspect of a line of business. Lines of business, such as recreation or law enforcement, cut across organizational boundaries and multiple projects. This approach lets a data architecture cover a wide swath of an organization using a more realistic scope of work.

"We would need to have a staff 10 times larger" to pursue a data architecture enveloping all aspects of an organization, Collins said. "We're going for smaller successes."

Another important consideration is the involvement of the business and IT sides of the house. Executives "don't want some IT person or even some [enterprise resource planning] company coming in and defining their core mission function data,”"said Michael Tiemann, manager of the enterprise architecture practice at AT&T Government Solutions.

Data should be defined by the people who perform the mission. "The data layer of an enterprise architecture is an interesting layer because it…has one foot in the technology wonk arena and one foot in the business wonk arena," he added.

Those initial steps in data architecture could be telling.

"I've always said the technical aspects of data and enterprise architecture are dwarfed by the organizational and cultural impediments that need to be overcome," Collins said.

— John Moore