Forces to test text-mining tool

Text-mining technology quickly combs through millions of documents in numerous languages

U.S. Joint Forces Command

As part of an upcoming war games experiment focused on information sharing among coalition partners, U.S. Joint Forces Command will test a text-mining technology that quickly combs through millions of documents in numerous languages.

The tool is intended to help analysts identify relationships among targeted people, places and events.

The Multinational Limited Objective Experiment 2 is scheduled for Feb. 10-28, 2003, and includes Australia, Canada, Germany and the United Kingdom. It will explore security issues with the goal of examining how to build an operational net assessment (ONA) in a distributed, collaborative environment, according to Joint Forces Command officials.

ONA is a continuous information-gathering process that builds a knowledge base to include coalition forces' awareness of each other, the environment, the adversary and the adversary's perception of the U.S. and its allies, according to Joint Forces Command.

Participants will use ClearForest Corp.'s ClearResearch text-mining software, which uses Extensible Markup Language to tag and analyze data. Using the software, participants will be able to display information on a single screen and then collaboratively develop a military response in far less time than it takes now, said Barak Pridor, chief executive officer of ClearForest.

"Intensive research needs require manually plowing through enormous amounts of documents individually," which can be highly labor-intensive and also can hamper knowledge discovery across documents, said James Rowley, Joint Forces Command's knowledge management engineer. "With the right rules defined, ClearResearch permits discovery of relationships between documents within a very large document repository. It identifies and interrelates key entities, facts, and events, creating a broad overview of information buried within vast amounts of unstructured content."

For example, the company recently did a search on Yahoo for the terms "terror and bin Laden" and then tagged the 30,000 resulting documents. The results were broken down into numerous individual searches, including one illustration showing all the people related to al Qaeda with Osama bin Laden closest to the center. Another graphic displayed documents supporting a relationship between a charitable trust organization and the Taliban, Pridor said, adding that all of the results maintain links back to the original documents.

Based on the number of documents involved, ClearResearch can complete a search and display the desired interpersonal or organizational relationships in a few seconds to about two minutes, Pridor said.

ClearResearch is not part of the exercise's formal objectives and use by the other participating nations is voluntary. But the technology — which supports English, Arabic, French, Spanish, Portuguese and German — does "show great potential to help in discovery of relationships and identification of insights across a large document collection," Rowley said. "Our goal is to experiment with its applicability and value to the ONA development process."

New York-based ClearForest first demonstrated ClearResearch to the Joint Forces Command in April, and the company has already put military staff through the four-hour training course, Pridor said. He added that next year's experiment will mark the first time a U.S. agency has tested the tool, although the Israeli defense agency has used it.

A basic version of the software costs $50,000, but more advanced forms are priced at up to millions of dollars per user.

NEXT STORY: Panel to review telework impasse