Feds launch bioinformatics centers

Institute contracts for databases and portals to compile info on diseases

NIAID informational site

Officials at the National Institute of Allergy and Infectious Diseases (NIAID) are enlisting private companies and universities to help make data available about disease-causing organisms.

Under eight five-year contracts, companies and academic institutions will develop databases and Web portals to compile information about a small number of pathogens, the germs that cause disease. Some of the pathogens are considered possible bioterrorism agents, while others are public health concerns unrelated to terrorism.

With NIAID footing the bill, the data will be freely available to scientists, said Valentina Di Francesco, bioinformatics program director in NIAID's Division of Microbiology and Infectious Diseases. Officials at NIAID, which is part of the National Institutes of Health, have awarded seven Bioinformatics Resource Center (BRC) contracts, and one more is in negotiations, she said.

The total value of the contracts has not been determined, but it will be at least $88.3 million, according to institute figures.

"The major goal is to provide the scientific community with a robust point of entry for accessing data," Di Francesco said. "This database has to be user-friendly to the research community."

The study of how an organism's genes are arranged and interact is called genomics. Genes order the body's cells to produce proteins, which affect the body's processes. The study of the proteins is called proteomics. The BRCs will support both disciplines and related fields.

The other component of the BRC project is to develop and distribute open-source software for researchers to use in viewing and managing data, Di Francesco said. This includes developing a set of standards for systems to freely exchange genomic data.

Bioinformatics is a difficult field to work in because it requires collaboration between biologists and computer scientists to develop systems that address the data's complexity and are useful to researchers, she said.

"We have a couple of folks on staff who are Ph.D. biochemists and who have picked up" the information technology, said Kathy Adams, senior vice president and director of the civil sector at SRA Corp., one of the BRC participants. "You tend to come into this field in one of two ways -- through IT or as a scientist -- and you pick up the other side. You really need to marry the domain with the technology and provide a solution."

Scientists realize they need to share data to progress with their work, she said.

"It used to be that the scientists worked in their labs, and they were isolated," Adams said. "The whole thought in the last few years is that if the scientists are working on specific pathogens, if all the researchers around the world who are working on that [group of germs] are able to share information, you can speed the cure and prevention of these diseases."

For SRA's part in the BRC project, Adams' team will work with scientists at the University of Wisconsin at Madison to create a database cataloging information about enterobacteria, including E. coli and salmonella.

Like the other contract awardees, SRA's team is just getting started, she said. Members recently held an initial meeting and decided to build on a platform they have already developed.

Once the BRC is running, scientists and developers will work to encourage other researchers to use it and add their own data to it.

"In a way, it's [a matter of] build it, and they will come," she said. "Part of this is to get people to deposit data so that the warehouse will grow. We'll start to populate the warehouse with data that's out there" in publicly available databases.

At the Institute for Genomic Research (TIGR), a nonprofit research center in Rockville, Md., the BRC will focus on bacteria that cause anthrax, botulism, tularemia and melioidosis, among others, said Owen White, principal investigator. White wants to have the first version of a Web portal online within six months, he said.

Often no single genome sequence exists for an organism, he said. TIGR, for example, has 15 sequences for the anthrax bacterium. White has 18 people at TIGR involved in the project.

Officials at Northrop Grumman IT, another BRC awardee, wanted their center to include at least one bacterium, one virus and one parasite, said Kevin Biersack, the company's bioinformatics program manager.

"We also wanted pathogens that had not just a bioterror threat but also a public health threat," he said. "We did not want to just concentrate and focus in one arena." Northrop Grumman's BRC will cover the organisms that cause tuberculosis, giardiasis and influenza as well as the castor bean plant, from which the biotoxin ricin is derived, among others, he said.

Biersack's 17-member team has not had its first program meeting yet, he said, but he believes the firm will be able to build on previous work. The University of Texas Southwestern Medical Center is also on the team.

"We feel good about approaching it straight-on," he said. "The data is voluminous, and it's varied and diverse."

One challenge for Biersack is figuring out whether accumulating data into a single centralized database or spreading it out in a model is better, he said.

Access to current information is a challenge for biology and biotechnology researchers, making efforts such as the BRC project worthwhile, said Sara Radcliffe, director of scientific and regulatory affairs at the Biotechnology Industry Organization, an international association for biotech firms.

"The information coming out of genetics and genomics is just vast," she said.

The human genome, which was first mapped in 2000, provides the most dramatic example of the growth of information, Radcliffe said. Scientists continue to produce volumes of information about each organism they study.

"There's more and more information, both more detail on the actual genome and also [information regarding] what does it all mean," she said. "How does the sequence relate to proteins, what are the proteins doing, how does [the genome] relate to specific diseases?"

Making such information available in one place, rather than spread across multiple public and private databases, is extremely valuable, she said.

***

BIOINFORMATICS RESOURCE CENTERS

A partial list of the Bioinformatics Resource Center projects, funded by the National Institute of Allergy and Infectious Diseases, includes projects by:

The Institute for Genomic Research.

The University of Notre Dame, with partners including the European Bioinformatics Institute and the European Molecular Biology Laboratory.

The University of Alabama at Birmingham, partnering with the University of Victoria, Canada.

SRA International Inc., partnering with the University of Wisconsin at Madison.

Northrop Grumman Information Technology, partnering with the University of Texas Southwestern Medical Center and Vecna Technologies Inc.

The Virginia Bioinformatics Institute, partnering with the Loyola University Medical Center, Social and Scientific Systems Inc. and the University of Maryland.

The University of Pennsylvania.

Source: National Institute of Allergy and Infectious Diseases

BIOINFORMATICS GLOSSARY

Information technology supporting biotechnology research must include with a bewildering array of data. Here is a quick guide to some common fields of study that the IT supports:

Genetics: The study of inheritance patterns of traits in organisms.

Genomics: The study of all of an organism's DNA, including genes, their locations along chromosomes and their interactions with other genes. It also includes so-called junk DNA, strands of DNA that exist outside genes and serve an undiscovered purpose, if any.

Proteomics: The purpose of genes is to instruct cells to manufacture proteins, which are complex 3-D structures. By comparison, genes are simple, long chains of amino acids. The field of proteomics studies proteins, their functions, their shapes and their relationships to genes.

Single Nucleotide Polymorphism (SNP): SNPs are variations in the DNA of different members of a species. Identifying

and analyzing SNPs is a key use of bioinformatics.

Sources: National Human Genome Research Institute, National Institutes of Health, University of Kansas Medical Center and Greenwood Genetic Center.