National Library of Medicine prescribes XML

XML is the technology behind the largest database of published medical information in the world

XML is the technology behind the largest database of published medical information in the world.

XML helps the U.S. National Library of Medicine input, store, process and output the data in its MEDLINE database, said Simon Liu, director of information systems at NLM.

MEDLINE is the largest database of published medical information anywhere, holding more than 11 million article citations from stories published in more than 40,000 medical journals from 70 countries. NLM has been using XML in the MEDLINE system for almost three years, which makes it one of the government's early adopters of the technology, Liu said.

"In the past, [publishers] were using hard copies to send it to us and then we had to manually input it," Liu said. "But now we use an XML format for the input process and into the metadatabase."

For storage, NLM currently uses mostly Oracle Corp. databases and a few XML databases, but Liu expects a large jump in the amount of information stored in XML format by the end of next year.

Because it's the largest medical citation resource on the planet, MEDLINE has myriad users from all over the world — and they are now receiving data or "output" that is generated in an XML format, Liu said.

"In the past, we stored in a plain text structure, and output was in a different format also," he said. "But XML supports Unicode [translating software], so all users from around the globe who speak different languages can use it. XML is portable from machine to machine and system to system, but also from language to language."

One drawback involved with using XML is that NLM has to write its own XML-based document type definition files to generate the format necessary to make it work with MEDLINE. A DTD can be written to define the structure of a particular kind of XML document or file.

"In the future, doing that should be easier, assuming the vendors come up with more tools to allow us to do customization," Liu said. "Now, for XML-based DTDs, we have to do it by ourselves."