Search gets smarter

5 solutions that could boost agencies’ productivity.

Citizens, federal employees, information technology administrators and agency executives all share a common need to locate information as quickly and efficiently as possible. Yet search technologies often produce a myriad of results and frequently few, if any, are the precise information people are seeking.

Agencies can and should inject efficient search technologies at the appropriate points across the enterprise. Obviously, there are solutions to simplify Internet-based searching. Moreover, you may want to consider a separate, role-based solution to support agency searching across an intranet. Finally, if your employees are heavy searchers, you may want to implement client-side tools that enable efficient crawling of multiple search engines simultaneously.

We recently surveyed some of the available search technologies to gauge progress. We find that search tools are improving, and we believe search will only grow more efficient. Keeping an eye on search tools progress will enable your agency to capitalize on ongoing improvements while keeping the budget in check.

USA.gov: Something for everyone

In January, a rebranded FirstGov emerged. Now called USA.gov (www.usa.gov), this Web-based search portal has several unique features that are well-suited for a variety of audiences.

From the starting page, a tabbed presentation provides easy access for the public, businesses and agency employees. New in the latest incarnation of this search portal is the ability to chat with a person in real time. Visitors to USA.gov can chat with government employees Monday through Friday from noon to 8 p.m. Eastern Time.

Underlying the USA.gov site are two complementary search-related technologies. The first is Vivisimo (www.vivisimo.com), a search technology that produces clustered results in a manner that makes it easier to pinpoint accurate information. The second is MSN search (www.msn.com).

Vivisimo, in particular, offers a useful differentiation when compared to other available tools. When searching on USA.gov, we found that search results were placed in clustered folders, which were then accessible by topical area, agency or source. The clustered approach made it much easier to locate relevant information.

The USA.gov portal — via Vivisimo and MSN — also offers the ability to return results found in a variety of sources, including frequently asked questions forms, audio materials, office documents and PDFs.

Google: Specialized search for government
Although Google (www.google.com) may often be considered synonymous with Internet searches, the company offers other solutions that may be well-suited for federal agencies. In particular, Google offers a specialized U.S. Government Search (www.google.com/unclesam).

Google’s government search includes .gov and .mil domains, and select sites that are relevant and fall within .com, .us, or .edu domain types. When searching Google’s government site, we found that if we just entered the search term and pressed Enter, the search engine returned only Web-specific content.

After entering the search term and clicking on the Search Government Sites button, we were able to obtain government-specific search results. Although Google U.S. Government Search does not provide federal, state and local domains as granularly as USA.gov does, it is possible to use the Advanced Search feature to define which domains are culled for results.

Aside from searching capabilities, Google enables users to customize their interface through the use of a Google log-in. Once logged in, users can customize their interfaces by using the content directory to populate their pages with RSS feeds from various sites.

For users seeking search capabilities within agency walls, a Google hardware appliance may be a good option to consider. The company offers two models, the Google Search Appliance and the Google Mini. Both can search through content and support more than 220 different formats. The former can scale search support to more than 500,000 documents while the latter sports capabilities to search 100,000 to 300,000 documents.

If your agency uses geospatial tools, you might want to consider using Google’s Earth Enterprise and Maps for Enterprise. Earth Enterprise uses your own images or Google’s satellite imagery and can be scripted into a service or application, while Maps for Enterprise can be used to create detailed mapping applications.

Finally, Google offers the typical search toolbar for agency employees’ Web browsers and Google Desktop for Enterprise. The latter could be more useful than you think because documents often remain on desktops and instead of making their way onto the appropriate server. Cataloging the contents on desktops will help preserve agency assets. Agencies that are interested in exploring other desktop search options will want to examine Beagle (beagle-project.org).

RetrievalWare: Server-side search and retrieval
Convera is also addressing search and retrieval on the server side inside the enterprise or agency walls.  The company’s RetrievalWare solution is geared to reducing the cost and time it takes to locate accurate information within the enterprise.

You could think of RetrievalWare as an intra-enterprise, metasearch tool because it can go across multiple types of file systems, portals and various repositories to locate needed information by an agency. Some examples of these include Red Hat Enterprise Linux AS 4, Oracle 10g and IBM’s WebSphere Application Server.

RetrievalWare can support a variety of indexing methods, including distributed, parallelized indexing for large document collections. The Convera solution also supports content filtering and concept and entity extraction regardless of the content. Moreover, this solution can address structured, unstructured and semistructured data types.

If your agency is information-intensive, RetrievalWare is worth considering because it offers automatic and dynamic classification support, useful administration features such as index alerting, and access control support that can tie into your existing security implementation.

Copernic: Powerful, metasearch tool

Copernic Technologies is also looking to make search and retrieval more efficient. It offers a variety of tools that are most helpful to users. Like Google Desktop and the Beagle project, Copernic offers an indexing function that can tap data stored on local or network drives.

However, in addition to indexing  Copernic provides a powerful, metasearch facility that can retrieve information from many search engines concurrently in
response to a simple or advanced user query. 

For example, we used Copernic’s government related engines and executed several searches. As the search progressed, we could see it traversing all of the engines, and the results of the search were saved to a folder. 

Equally useful, Copernic analyzed the results it found before letting us work with the results. The analysis included strict link checking for all of the results, which saved us time by eliminating invalid or inactive results.

Two other Copernic features — tracking and summarizing — will also pique the interest of agency employees. The tracking component automatically monitors Web pages and detects any content changes or updates made to the pages. 
Copernic sends an e-mail message to advise the user of the content change. The search engine also highlights the content that has changed so users keeping tabs on longer documents can significantly reduce their read/update times.

Summarization technology uses statistics and various algorithms to detect the key points within a document. This function then extracts the relevant material to create a condensed version of the original document with just the critical items included.

Memex: An intelligent engine
Another specialized search facility is available from Memex (www.memex.com), and it specifically targets the intelligence communities and law enforcement. The Memex Intelligence Engine provides facilities that support highly accurate searches on huge amounts of structured and unstructured data in seconds.
Data can be located even if it is entered into an incorrect field in a relational database or if it is buried inside a large PDF. Memex can also identify locations, proper names and relationships.

In particular, Memex provides facilities, such as a query builder, so analysts don’t have to be query experts to be productive with the solution. Likewise, Memex is geared for efficiency with index updates committed in real time and data compression that the company says compresses data by about 60 percent compared with its original size before storage in Memex.

Users can secure data stored in Memex at the field level, if necessary. Moreover, the Memex solution includes advanced searching methods, such as sounds like, range and keyword searches.

The clusters have it

Aside from USA.gov, there are some other general search engines that employ clustering or visualization techniques to improve the precision and efficiency of search results. Northern Light (www.northernlight.com), a pioneer in the field of clustered search engine results, also has an enterprise search engine that can be customized to suit most agencies.

Other search engines, such as Clusty (www.clusty.com) and Mooter (www.mooter.com) provide clustered results output that can be refined to yield highly accurate results. A similar metasearch tool, Kartoo (www.kartoo.com) submits user queries to multiple search engines and reports results in a highly visual, mapped form. Users can move from one visual map to the next to refine results.

Activating search smarts
Search engine technology is once again undergoing a new round of metamorphosis, but that should hardly be surprising.

The technology underlying search engines still has its roots in the information retrieval field, which dates back more than 50 years.

In a 1966 Scientific American article, author Ben Ami Lipetz concluded that information retrieval would not evolve as a technology until researchers understood the various ways that humans process information. We are only now beginning to narrow our focus to gain that deeper understanding of that process.

Even with all the content already available to search engines today, there are by some accounts more than 500 times as much information that has yet to surface because of the current limitations of available search engine technology, as compared to types of content. Nevertheless, progress is under way.

Some personal digital assistants and cell phones can now provide real-time tools for location-based information searches. Search histories and Web browsing behavior captures are also helping search providers to support more refined searching capabilities.

In the future, search technology and data mining will become more closely melded. Together with advances in user interface design, that will yield the next leap forward in information exposure — like it or not.

Agencies that keep an eye on the advancing field of search technology should be able to adopt them early and often enough to gain a competitive advantage.

Biggs is a senior engineer and freelance writer based in Northern California.