Aug 95

Level of Govt: State, local

Function: Text retrieval; Document Management

Problem/situation: Government agencies need faster, more accurate ways to get at information stored in computer files.

Solution: Full text search and retrieval engines.

Jurisdiction: San Bernardino County, Calif., Municipal Water District

Vendors: Datapro Information Services Group, Compuserve, Prodigy, America Online, Excalibur Technologies Corp., Fulcrum Technologies, Information Dimensions Inc. (IDI), Personal Library Software Inc., Verity Inc., ZyLAB, Delphi Consulting Group, DEC, Apple, Microsoft

Contact: Karen Shegda, Datapro Information Services Group 609/764-0100; Robert Tincher, San Bernardino Valley Municipal Water District, 909/387-9244.

By Tod Newcombe

Contributing Editor

For users of imaging applications involving complex documents, indexing has always been a problem. Someone with knowledge about the document's subject matter had to analyze the scanned images and come up with a series of keyword identifiers. Despite the time and resources spent on indexing, users had no way of knowing whether they would always find everything they were looking for.

To both automate the laborious process of indexing and increase the accuracy of searches, imaging users have turned to full-text search and retrieval technology. Once a mainframe tool, search and retrieval has migrated to client/server and PC-based applications. Instead of someone manually entering keywords into an index database, document images are converted into text using optical character recognition (OCR) technology. Text retrieval software then converts the entire text file into an index, allowing users to find a document by searching for any word that appears in the text.

Today's search and retrieval tools can even compensate for bad spellers. Using a technology called fuzzy searching, the software reduces a query term to its root form, making it possible to locate words spelled several different ways. The software will even rank the relevance of documents it finds by the number of times the root word appears. Taking the same approach one step further, search and retrieval is going beyond text searching and is being applied to other forms of information, including digital images, video and sound.

"In the near future, we're going to see more information retrieval as opposed to just text," predicted Karen Shegda, associate managing editor for Datapro Information Services Group. As for search and retrieval today, Shegda said much has improved. "The products have gotten easier to use and the software is much more sophisticated." She mentioned that the leading products use pattern recognition, algorithmic and statistical systems that simplify queries while improving the precision of each search.


Search and retrieval technology has been around for at least 25 years. For most of that time, it has been used in mainframe and minicomputer applications. Today, the technology also runs on stand-alone PCs, workstations and client/server systems.

It is also entering new markets, thanks to the recent growth in electronic publishing, CD-ROM, document imaging, and information superhighway services, such as the Internet, Compuserve, Prodigy and America Online. With their huge databases and millions of customers, these services need search tools that are fast and easy to use.

Also aiding the growth of the search and retrieval market is the falling costs of mass storage for computers. As a result, corporate America and government are storing more information electronically than ever before.

Where once just a few vendors served the market, today there are many. Some of the leading names include: Excalibur Technologies Corp., Fulcrum Technologies, Information Dimensions Inc. (IDI), Personal Library Software Inc., Verity Inc., and ZyLAB. Typically, these vendors sell their search "engines" to other vendors, such as an imaging software developer, who integrates the engine into an imaging system.

According to Shegda, a full-fledged, sophisticated text retrieval system can cost under $500 per user