Is XML too prolific?

General Services Administration study to advise on need for government repository

The beauty of Extensible Markup Language is that it's universal and can make information understandable to dissimilar computer systems.

But an ugly side of XML threatens to emerge. Many versions of it are being developed, and their differences threaten the very universality that makes XML so attractive in the first place.

There's fear that divergent XML dialects may already be developing in different government agencies. If allowed to flourish, these dialects could re-create the very problem XML was intended to solve — too many languages that make disparate computer systems unable to easily communicate with one another.

Last month, the General Services Administration hired consulting firm Booz Allen Hamilton to advise the federal government on whether it needs to create a repository of government XML data structures. The answer will almost certainly be yes, XML experts say.

The repository would contain XML elements that represent the agreed-upon standards for how particular pieces of digital data should be labeled in XML.

GSA and Booz Allen expect to issue an interim recommendation on the repository late this month and a final recommendation in October, according to Marion Royal, GSA's expert on XML.

When creating digital documents, templates or other data, government authors and developers would consult the repository to find the proper data tags. If none exist, they would create them and submit those creations to the repository for later use by other authors. Thus the repository would keep XML from devolving into multiple, incompatible idioms.

That fracturing is already happening to a degree in the business world, where different versions of XML have been created for different businesses. The XML used for human resources is different from that for supply chain management. The robotics industry "speaks" a different version from the automotive industry, as does aerospace from accounting.

The General Accounting Office warned April 5 that although agencies need to tailor XML to meet their unique needs, "they risk building and buying systems that will not work with each other in the future if their efforts do not take place within the context of a well-defined strategy."

In a report prepared for Sen. Joe Lieberman (D-Conn.), GAO endorsed building a "registry of government-unique XML data structures, such as data element tags and associated data definitions."

Such a registry or repository would serve as a guide for government system developers, enabling them to use the same "inherently governmental" data tags and definitions so their systems can communicate with one another.

Thus, if all agencies used the same XML tags to identify the same data, law enforcement agencies could better find and retrieve information about criminal suspects whether it was stored in federal, state or local databases, the GAO report says. The report included recommendations for the Office of Management and Budget, working with the CIO Council and the National Institute of Standards and Technology, to guide governmentwide adoption of XML.

If all agencies used the same data tags, XML could be extremely valuable in homeland security, said Owen Ambur. Ambur and Royal serve as co-chairmen of the XML Working Group, established by the CIO Council. The inability of agencies such as the FBI, the CIA and the Immigration and Naturalization Service to effectively share information has been identified as a critical problem for homeland security, Ambur said, and XML would help them share information more effectively.

If different XML tags are used for the same information — if, for example, the tag "name" is used to identify a person's name on a police report, but "applicant" is used on a visa application — it may not be possible to compare police data against visa records.

Of course, XML would have more mundane uses as well. For example, if personnel information on government employees was tagged according to governmentwide standards, it would be readily available, whether for processing a benefits enrollment or a pay raise or registering a retirement, Royal said.

GSA is working with NIST on a proof-of-concept document for a government XML repository, he said.

"It need not be a centralized repository," Royal said. Rather, parts of the repository might reside at various agencies. Indeed, a number of agencies are already establishing their own XML repositories. The Defense Department has already set up a repository of defense-related XML data. The Environmental Protection Agency is also working on one, and other agencies are considering starting their own, including the National Archives and Records Administration, Royal said.

A government repository might be created by linking those agency repositories and building on them, he said.

While endorsing the idea of building a government XML repository, GAO officials counseled caution on XML's security front.

The markup language's ability to improve data sharing among computer systems "has the potential to increase security risks," the agency warned in its report. XML documents may create problems for virus-screening software, which may have difficulty detecting viruses in XML files or "could be tricked into processing malicious code," the GAO report says. "It is unclear how significant this potential vulnerability will be."

***

Spreading XML across government

In a recent report prepared for Sen. Joe Lieberman (D-Conn.), the General Accounting Office endorsed a registry of Extensible Markup Language data tags that agencies could use to enable their systems to communicate. GAO also recommended that the Office of Management and Budget, working with the CIO Council and the National Institute of Standards and Technology, develop a strategy for governmentwide adoption of XML that would include:

* A plan to transition the CIO Council's pilot XML registry to an "operational governmentwide resource."

* Policies and guidelines for managing and participating in the governmentwide XML registry "to ensure its effectiveness in promoting data sharing capabilities among federal agencies."

* A process for identifying and coordinating government-unique requirements and presenting that information to private-sector standards-setting organizations during the development of XML standards.

NEXT STORY: FAA awards modernization deal