Bioinformatics software spreads

NIH works with SRA to develop software for medical research

GoMiner and MatchMiner information

Bioinformatics software is not well-known in many circles. Demanding both programming skill and detailed expertise in biological sciences to design, software products in the field are relatively new but are evolving as rapidly as the science they support.

Broadly defined, bioinformatics software facilitates genomic research, including the cataloguing and searching of vast databases of information about genes and the proteins for which they are coded.

The field, in private-sector companies and in the government, is constantly widening, according to Stewart Bernstein, vice president of sales and marketing for TurboWorx Inc., a bioinformatics software developer.

"The data continues to grow at an exponential rate, and it will continue to do so. It's not going to get any less," he said.

The federal government needs bioinformatics for medical research at the National Institutes of Health and for bioterror projects at other agencies. The list of suppliers is shorter than that for more common systems, and some unlikely players are becoming involved.

SRA International Inc., for example, has a small bioinformatics division and a long-standing contract with NIH. NIH and SRA have created two products that are available to researchers — MatchMiner and GoMiner.

In an unrelated project, NIH is collecting proposals from vendors to build several Bioinformatic Resources Centers. The centers will house databases with information on at least five microorganisms that scientists will use for research.

"Everything that we do flows from our own needs as experimentalists," said John Weinstein, a senior investigator at the National Cancer Institute who led the government's efforts to develop MatchMiner and GoMiner. His goal was to create software that his own research teams needed and have SRA turn it into products useful to the wider scientific community.

MatchMiner searches through databases to find genes that recur among them. Often a single gene and its clones are identified in many different ways, making it difficult to spot matches.

"It's difficult to do it even for one gene, having to go to seven or eight public databases," he said. "When you're doing it with multiple genes," the automation is indispensable.

Genes are often provided to researchers in microarrays — collections of several genes held on a medium suitable for examination and analysis. Once the research is done, Weinstein said, the computational work moves into high gear.

"If one does a microarray experiment, or genomic or proteomic, more time is spent and there's more heartache after the analysis is done in interpreting the data," he said. "Usually, one ends up with a long list of genes and scratches the head and says what does this mean biologically?"

GoMiner, the newer product, uses the Gene Ontology — a classification system developed by the Gene Ontology Consortium — to determine how multiple genes relate to one another.

In studying cancer, for example, scientists might discover that some genes are not working when the disease is active. GoMiner can help determine if they are dealing with genes that perform similar functions by sorting out where the genes belong in the Gene Ontology.

NIH has worked with SRA's bioinformatics team for several years, and the company is adept at the science, Weinstein said. Through what SRA calls its agile software development techniques, the software developers and the scientists work closely together.

Software developed without that close contact is typically off the mark, Weinstein said. "In my experience, it does a good job of solving the wrong problem."

David Kane, technical lead for SRA's bioinformatics group, said he uses a number of tactics drawn from the software development techniques pioneered by a consortium of programmers called the AgileAlliance.

The collaboration is especially important in developing products that must meet the needs of researchers in the future, he said.

"It's an exploratory endeavor," Kane said. "What the investigators are going to need six months or a year from now may not be what they need now." The approach gives the scientists on the team a lot of flexibility to change direction.

SRA has devoted about 30 of its 2,500 employees to the bioinformatics division, said Ernst Volgenau, SRA's president and chief executive officer.

The company moved into the field slowly as a result of a series of contracts, he said. "Many years ago, we won a job for the Food and Drug Administration," he said. "The purpose of the job was to use information technology to expedite the approval of pharmaceuticals. Meanwhile, we started to get consulting work in the health part of [the Defense Department] and [the Department of] Health and Human Services."

Building on the earlier work, SRA competed for and won an NIH contract for a less specialized job: network management. "They have a huge network on that campus," Volgenau said. "Their network has grown and they wanted it better maintained. In the meantime, they, like other biological research organizations, have come to realize the value of having common databases. A scientist with a team of a few people can now share data with others."

As the work progressed, SRA's domain expertise expanded, he said. "As time went on, we began to get people who had backgrounds in bioinformatics because of customer need."

Volgenau intends to continue the company's bioinformatics work and points to an expanding range of opportunities in the government in biodefense and related fields.

"We've seen steady growth. It's not like a big job that you get and put a lot of people on," he said. "The work in bioinformatics is evolutionary. I don't think in the short term that it's going to have any dramatic effect on our revenue. But in the longer term, it's very strategically important to us."

***

What is bioinformatics?

Bioinformatics is software used for analyzing genome sequences, studying proteins and genes, cataloguing and searching genomic databases, and managing biological information. It is used in medical research, the study of bioterrorism agents and other life science applications.

NEXT STORY: Alliance offers 511 one-stop shop