In what is one of the more ironic twists of fate in the short but fast life of the Internet, the most popular Web sites have nothing to do with entertainment, education or shopping, but with finding information. These sites, with names like Alta Vista, Excite, Infoseek, Lycos and Yahoo, have become so popular that the companies running them are the darlings of Wall Street, with stock valuations in the billions of dollars.
But for anyone who has used these search sites, the results can be frustrating. While simple to use, the search engines cough up a laundry list of sites and documents, many of which have little to do with the actual subject matter.
"The search engines on the Web, such as Yahoo and Excite, are lightweight, meant for recreational use only," explained Ron Weissman, vice president of strategies and corporate marketing for Verity Corp., a search engine software firm. "They only search HTML (Hy-perText Markup Language) tags and text."
The poor accuracy of Internet search engines may lead some organizations to wonder about the practicality of search and retrieval systems if workers have a difficult time finding what they want. But Weissman says not to worry. Highly sophisticated search engines are available that understand business rules, recognize word patterns and search for thematic concepts in the hunt for the right information.
These quasi-intelligent tools can be used to search everything from intranets, document management systems and word processing files, to newsfeeds, e-mail and even graphical images and video.
How the engines search varies from product to product, although most rely on at least several recognized query techniques. Verity's core product, Verity Information Server, indexes, searches and retrieves information on Web and file servers stored in a variety of formats. Information Server uses a combination of query operations including Boolean, proximity, frequency, concept, weighted and fuzzy word searching. It also provides a selection of query formats, such as commas that string together keywords, to help workers find the needle in the haystack.
Fuzzy searching involves pattern matching so that a search for "legislature" will also recognize misspelled versions of the same word, such as "ligislature" or "legisleture." Misspellings often crop up in documents scanned into computers using optical character-recognition software.
Proximity searches look for two or more words in the same sentence or paragraph or for one word that's within two or three words of another. Concept searching looks at the words in a document to infer what the document is about. This technique avoids the problem with keyword searches, which can't tell the difference between a Macintosh apple and the Macintosh computer. To aid those navigating through a search, Information Server will rank hits by relevancy, highlight keyword terms, and cluster and summarize search results.
Add-on tools, such as Verity's Intranet Spider, which indexes Web sites and file servers at a rapid clip, and the Agent Server, which monitors the Internet and intranet sites and automatically retrieves information based on preset profiles, give an organization an arsenal of tools for finding just the right piece of information without wasting too much time.
Other search engine leaders, such as Excalibur and PC Docs/Fulcrum offer similar features, but have carved out different niches. Excalibur's RetrievalWare search software has built up a following for its image and video search capability. SearchServer, from PC Docs/Fulcrum, is considered highly scalable, capable of conducting searches across many millions of documents.
What's important to remember, according to Weissman, is that search engines available to organizations are much more efficient, effective and productive than their more popular brethren found on the Internet. More important, the cost barrier to these high-end search engines has dropped in recent years. "The market is big enough now that the technology has come down in price," Weissman said. "The technology is no longer so specialized to be out of reach of most state and local government agencies."
Sea of Public Documents
Search and retrieval solutions have caught on in the federal government, where electronically stored documents number in the millions. Also driving demand is the Freedom of Information Act, which has forced federal agencies to open up vast repositories of information to the public.
The state and local sector has been slower to adopt the technology, but that's beginning to change. Weissman sees the strongest activity happening in three key areas: property, law enforcement and procurement. Much of the demand for information in these areas is coming from the public, either directly or indirectly. Other areas of interest include human services and transportation.
One example can be found in Oklahoma City, which uses Excalibur's RetrievalWare to search for information across a battery of servers and storage devices (see Government Technology, October '98). There, the city is still refining how it will use the powerful technology. But at the Texas Association of School Boards (TASB), based in Austin, search and retrieval has been part of the daily workload since January last year. TASB's membership includes all 1,047 Texas school districts, as well as a host of educational service centers, cooperatives and junior colleges. TASB provides training and an array of services including legal, policy and personnel support.
The association installed Verity's Information Server to assist with three document-intensive applications on TASB's intranet. The claims administration group for workers' compensation uses Verity to find documents pertaining to specific claims summaries, medical advisories and appeals. Another application contains information from the Texas Legislature. Various bills and statutes are classified in directories and then indexed for full text searching. Members can use their browsers to quickly search and find legislative information pertinent to their district or field of public education.
A third application perhaps best demonstrates the power of today's search and retrieval technology. Individual school district policies, once kept in huge paper manuals (more than 1,000 pages for the large school districts), are now available on TASB's Web intranet. The policies are kept in one repository, but are classified by the search engine into individual directories for each school district. Verity's Intranet Spider indexes more than 80,000 documents nightly to keep information current.
What this means, according to Brent Stackhouse, TASB's tech analyst, is that when a member starts searching for information on a specific school policy, such as search-and-seizure rules, the search engine pulls up information pertinent to that district, not information from the other 200 school districts that currently have their manuals in the system. "This application is a huge time-saver and we know the [policy] information is current," Stackhouse said. "We don't have to shuffle paper anymore."
Opening Up Portals
While search and retrieval is making its way into targeted applications within government, the technology is already beginning to evolve in two interesting directions. First, vendors are introducing specialized search engines, including some that can shop, find a video or perform a precise task based on artificial intelligence.
Second, Web portals aiding in the art of information navigation are appearing on the Internet and on intranets. Portal software vendors, such as Autonomy and Plumtree, both based in San Francisco, offer customized versions of portals found at popular Web sites, such as Yahoo, Excite and Netscape. Verity has also introduced a new product, called the Knowledge Organizer, which acts like a portal.
What these new software tools provide is a powerful, yet simple method of accessing information in the form of documents, images, e-mail and Web pages. Portals also provide the tools to search for the right kind of information and to automatically retrieve news in a timely fashion.
The key word here is navigation. And be sure to use these other key words on your next search: time-saving and knowledge.