are just 12 staff people.

Like any other bureaucracy, the agency is awash in paper. "We receive and produce lots of paper and have no efficient way to get at the information stored on the documents," commented Tincher. And like a growing number of government agencies, the Water District has reduced its support staff and increasingly relies on computers to do its office work.

A key piece of technology is the document imaging system they use for storing, filing, indexing and retrieving all project files, correspondence, legal documents and financial records. The District uses a hardware platform from Digital Equipment and Excalibur software for document imaging, indexing and retrieval.

According to Tincher, the system provides the staff with the means to retrieve documents by keyword, Boolean or fuzzy searches. "The fuzzy search capability takes spelling out of the loop," he explained. When a search is entered into the computer, the system provides the user with a list of hits and ranks them according to which document had the most keyword occurrences. When a user selects an item from the retrieval list, the software brings the document image - not the text file - to the screen.

"The Excalibur system is excellent for our small office, where we can't afford a file clerk," said Tincher. He added that the software finds and retrieves documents very quickly and is easy to operate for casual users.


With more government information being stored electronically, the ability to find files and information becomes more difficult. That makes text retrieval more important than ever. But choosing and integrating retrieval software into new or existing applications requires careful consideration. Shegda cited three key issues that must be addressed: standards, storage requirements and databases.

"You need to make sure that the search engine you choose can read the native file formats you use, such as Microsoft Word or Word Perfect," she said. "Some only provide limited support. If the search engine doesn't support one of your standard formats, then the file has to be translated into ASCII text before it can be indexed and searched," Shegda pointed out.

Storage requirements must be analyzed carefully, because indexes for full text retrieval require large amounts of storage space. When text retrieval is integrated with an imaging system, storage requirements can be considerable. These integrated systems typically have two databases: one for the full-text database and another containing keywords and pointers to the document image files. Search and retrieval times can slow down when the databases are large.


Where's text retrieval headed? Well, it probably won't be called "text" retrieval in about five years, according to Shegda. By then, retrieval technology - combined with more mature adaptive pattern recognition capabilities - will be more active in the multimedia environment. Also, retrieval tools will be more of a commodity item, embedded in word processing software or even operating systems, making them accessible to virtually all computer users. That's good news for government information gatherers.