Data explosion strains peer review

NIST steps in with a system for researchers to exchange and validate scientific data.

A common complaint about the Web is that users can easily get overwhelmed while trying to sort through all the information published on it. When scientists at the National Institute of Standards and Technology confronted that issue, they turned to the Web to fix the problem. In NIST’s case, the production and publishing of thermodynamic data in industry journals caused the overload. Recent improvements in measurement equipment mean that an already voluminous amount of thermodynamic data is doubling every 10 years.That explosive growth is straining the traditional journal-based peer-review system and causing increasing numbers of errors to creep into the data. Companies in the chemical, pharmaceutical and energy industries depend on accurate data for their engineering applications and research projects.As part of its responsibility for promoting U.S. competitiveness through standards and technology development, NIST worked with industry partners to create a standard data format and online system for verifying and disseminating thermodynamic data. The agency’s simple and effective solution is attracting the attention of other groups that need to share large amounts of complex data, said Michael Frenkel, director of NIST’s TRC — formerly the Thermodynamics Research Center.When TRC officials took on the challenge of improving access to thermodynamic data, they knew it would not be easy. “Ever since computers became a part of the modern technology infrastructure, there have been efforts to promote more efficient propagation of this data,” Frenkel said. “A number of projects were tried, and they all failed.”Fortunately, by the time NIST launched its project, Extensible Markup Language had matured enough to facilitate the sharing of information among different computing platforms via the Web. Choosing XML was a big reason for the project’s success, Frenkel said.NIST’s global data exchange system has three parts: The system takes new and historical data published in journals, transforms it into a standard format that retains the characteristics of the original and stores it in a central database that researchers can access via the Web at any time. When authors submit data, the system automatically checks for inconsistencies and alerts the authors to any questionable data. Developers say the system catches many errors that would be difficult to detect through a traditional peer-review process.“I consider both the fast, convenient access to data [and] the system’s ability to detect inconsistencies and errors to be very important,” said Suphat Watanasiri, director of technology at Aspen Technology, which develops integrated software solutions.Previous failures to bring data providers and users together on information-sharing plans convinced Frenkel that cultural factors were as important to overcome as technical ones. Journal publishers and researchers have their own ways of working, and to get them involved, NIST had to understand and accommodate their specific needs.“We worked very closely with the journal publishers and editors,” Frenkel said. “It wasn’t an easy process because they were afraid that this could be a back door for others to get their copyright.”The software tools make it easy for them to participate, he said, and minimize the amount of time they have to spend manipulating data or entering it into the system.Frenkel said the project team understood that the industry would not embrace the system and its technology unless it was part of an international effort. Therefore, the team submitted a proposal to the International Union of Pure and Applied Chemistry to make ThermoML a formal international standard for thermodynamic data.“IUPAC is the body that sets international standards for the chemical field,” said Bryan Henry, president of IUPAC and a professor in the Department of Chemistry at Canada’s University of Guelph. “I think it’s fair to say that acceptance of ThermoML as a IUPAC standard is what got the [data-exchange system] up and running.”Publishers are already using the Web-based system, and its popularity will only increase as others join, Frenkel said. The user side of the equation also seems assured. There are about 120,000 chemical plants worldwide for which accurate thermodynamic data is essential.Frenkel said he anticipates that the growing acceptance of Web publishing will lead to greater use of NIST’s global data-exchange tools. “Particularly in the next 20 years, as the U.S. cyberstructure develops, having these tools of data validation available will be increasingly valuable,” he said.

A new scheme for data sharing









Standard exchange





  • ThermoML, an XML-based industry standard for formatting and storing thermodynamic data.
  • Software tools developed at TRC for extracting data from various academic journals.
  • ThermoData Engine, software NIST developed for evaluating research information.








Other success factors











A reusable model of data exchangeThe National Institute of Standards and Technology’s system for verifying and exchanging thermodynamic data might become a model for other agencies that need to share large amounts of complex data.

Michael Frenkel, director of NIST’s TRC — formerly the Thermodynamics Research Center — said groups outside the field have approached him about establishing similar processes.

One of the system’s selling points is its ability to handle data with more than 120 properties, which is important for other groups that deal with complex datasets. 

“I don’t think the situation is unique here” with thermodynamic data, said Suphat Watanasiri, director of technology at Aspen Technology. “We are dealing with a large amount of scientific data that [is] used to serve a broad range of process industries, and that situation might apply to other types of data that serve these and other industries.”

— Brian Robinson