in a network. Most packages range from $395 to $995 for a single user. The majority of the retrieval packages run under the Windows interface; a smaller number are available on the Macintosh operating system.
Text retrieval tools offer users several different ways to find the file or document they are looking for. They can search by keyword, which automatically retrieves exact matches. This is the simplest form of indexing and retrieving files. However, keywords searches require someone to define the relevant terms and assign them to the file. Incorrect keywords or the occasional keyword with dual meanings can reduce the accuracy of these kinds of searches.
The Boolean system, which was designed to overcome the shortcomings of keyword searches, relies on indexes that identify every word in a document. Users have the advantage of running queries on a large group of documents using the Boolean system. However, users often find Boolean searches can either retrieve too many or too few documents.
Another search technique involves statistical systems, which rely on algorithms to determine a document's relevance according to the frequency with which a keyword appears in the document. Taking statistical searches one step further, concept-based searching - sometimes referred to as natural-language searching - allows users to create hierarchies for search terms. For example, the concept term "computer" might retrieve all documents that refer to PCs, mainframes and workstations. When used in combination with statistics, concept searches can be extremely useful.
For scanned documents stored in imaging systems, full-text retrieval engines with "fuzzy searching" capabilities work best at overcoming problems created by OCR when it converts document images into text. Despite advancements in the technology, OCR is still finicky and can misspell words. According to Excalibur Technologies - a developer of text retrieval software - fuzzy searching can speed up searches by enabling people to find information even if documents are misfiled or if words in an index are misspelled.
Fuzzy searching is based on adaptive pattern technology, a form of intelligent software that allows users to index and retrieve documents based on repeating patterns in data. In essence, the technology allows people to ask a computer, "Have you seen anything that resembles this?" The object of a fuzzy search doesn't have to be words. It can also search for pictures, a video clip, a fingerprint or any other type of digital data, according to Excalibur.
RETRIEVING GOVERNMENT FILES
Market revenues for the text retrieval industry are expected to hit $552 million in 1995, according to Delphi Consulting Group, a document management firm. Government represents almost one third of that market, but most of that share belongs to the federal sector. State and local government has only a five percent segment of the market, according to Delphi, but if the education and library market is included, then the non-federal government share rises to 11 percent.
According to Shegda, common text retrieval applications range from litigation support in the legal field to customer service, such as correspondence tracking, to technical document management. In the government sector, text retrieval has helped agencies deal with the administrative burdens of regulatory compliance and corporate filings, such as the Uniform Commercial Code. The technology has also been useful in the field of legislative support.
SAN BERNARDINO VALLEY MUNICIPAL WATER DISTRICT
The San Bernardino Valley Municipal Water District is typical of government agencies that need and use text retrieval. It also represents where state and local governments are headed with the leading-edge technology.
The District is a water wholesaler, providing supplemental water to what Robert Tincher calls its "retail market": the cities of San Bernardino Valley, Calif.. Tincher, who is water resource manager for the District, said the agency serves more than 600,000 people and operates on a $20 million annual budget. Running the entire operation