In what is one of the more ironic twists of fate in the short but fast life of the Internet, the most popular Web sites have nothing to do with entertainment, education or shopping, but with finding information. These sites, with names like Alta Vista, Excite, Infoseek, Lycos and Yahoo, have become so popular that the companies running them are the darlings of Wall Street, with stock valuations in the billions of dollars.
But for anyone who has used these search sites, the results can be frustrating. While simple to use, the search engines cough up a laundry list of sites and documents, many of which have little to do with the actual subject matter.
"The search engines on the Web, such as Yahoo and Excite, are lightweight, meant for recreational use only," explained Ron Weissman, vice president of strategies and corporate marketing for Verity Corp., a search engine software firm. "They only search HTML (Hy-perText Markup Language) tags and text."
The poor accuracy of Internet search engines may lead some organizations to wonder about the practicality of search and retrieval systems if workers have a difficult time finding what they want. But Weissman says not to worry. Highly sophisticated search engines are available that understand business rules, recognize word patterns and search for thematic concepts in the hunt for the right information.
These quasi-intelligent tools can be used to search everything from intranets, document management systems and word processing files, to newsfeeds, e-mail and even graphical images and video.
How the engines search varies from product to product, although most rely on at least several recognized query techniques. Verity's core product, Verity Information Server, indexes, searches and retrieves information on Web and file servers stored in a variety of formats. Information Server uses a combination of query operations including Boolean, proximity, frequency, concept, weighted and fuzzy word searching. It also provides a selection of query formats, such as commas that string together keywords, to help workers find the needle in the haystack.
Fuzzy searching involves pattern matching so that a search for "legislature" will also recognize misspelled versions of the same word, such as "ligislature" or "legisleture." Misspellings often crop up in documents scanned into computers using optical character-recognition software.
Proximity searches look for two or more words in the same sentence or paragraph or for one word that's within two or three words of another. Concept searching looks at the words in a document to infer what the document is about. This technique avoids the problem with keyword searches, which can't tell the difference between a Macintosh apple and the Macintosh computer. To aid those navigating through a search, Information Server will rank hits by relevancy, highlight keyword terms, and cluster and summarize search results.
Add-on tools, such as Verity's Intranet Spider, which indexes Web sites and file servers at a rapid clip, and the Agent Server, which monitors the Internet and intranet sites and automatically retrieves information based on preset profiles, give an organization an arsenal of tools for finding just the right piece of information without wasting too much time.
Other search engine leaders, such as Excalibur and PC Docs/Fulcrum offer similar features, but have carved out different niches. Excalibur's RetrievalWare search software has built up a following for its image and video search capability. SearchServer, from PC Docs/Fulcrum, is considered highly scalable, capable of conducting searches across many millions of documents.
What's important to remember, according to Weissman, is that search engines available to organizations are much more efficient, effective and productive than their more popular brethren found on the Internet. More important, the cost barrier to these high-end search engines has dropped in recent years. "The market is big enough now that the technology has come down in price," Weissman said. "The