PROBLEM/SITUATION: How to put government data on the Internet without translating it into some consistent format.
SOLUTION: Server with Web browser interface and text retrieval engine.
JURISDICTION: Washington State, Washington Office of the Administrator of the Courts, Washington Department of Social and Health Services.
VENDORS: Apple Computer, Microsoft, Aldus, Netscape, Adobe, WebStar, WordPerfect, FrameMaker, Ragtime, Nisus, PICT.
CONTACT: Larry Hewitt,
"Publish once" is one of the new buzz words on the Internet -- placing your internal or public documents on the Internet in a usable format, and transferring the reproduction burden and cost to the end user. Distributed documentation -- and it sounds so easy.
But what do you do with those thousands of existing documents in your system? How do you publish legacy archives of data? Do you spend the money to create yet another form of archive and retrieval for the Internet? Worse yet, do you decide to translate all those thousands of documents into HTML, plain text or Adobe Acrobat Portable Data Format? What is the cost of converting and maintaining duplicate sets of documents?
These questions were recently put to the Washington State Department of Information Services Strategic Initiatives Group by the Department of General Administration (GA). Here is the problem in a nutshell:
GA handles hundreds of procurement documents defining approved products for purchase by state agencies, from cameras to toilet paper. GA wanted to place these documents in a searchable format available through a network to facilitate reaching the widest possible audience. The solutions they initially researched would require them to convert existing documents to compatible formats. Ongoing maintenance of two sets of data, the original source documents and the proprietary searchable ones, would involve considerable additional expense.
GA asked Strategic Initiatives to determine if there were any suitable technologies that met the following objectives:
* Documents would be searchable using readily available network tools.
* Documents would be retrievable as text and thereby available for use by customer workstations.
* Documents would be processed by the search engine without translating into additional formats.
It is possible that a variety of original source documents, from multiple kinds of programs and platforms, could be included in the document inventory.
This request for a strategic technology analysis happened to coincide with a seminar put on by Apple Computer on the latest Internet workgroup servers running under the PowerPC platform. At this seminar, the Strategic Initiatives Group learned about AppleSearch, Apple Computer's text retrieval engine and the Web interface which is bundled with the Internet Server package. I secured an Apple 9500 workgroup server on loan from Apple Computer and proceeded to place a test server within the DIS firewall to put AppleSearch to the test.
To bring the Apple PowerPC server up on the Internet was very simple. The software was virtually preconfigured, and with the installation of a Token Ring card and connection to the internal network, the WebStar server software was operational in just a few minutes. By day's end, after reading the AppleSearch documentation and experimenting with the software, I had a working directory with 10 different file types, from three different computer platforms, available for searching and retrieval with the Web interface.
I proceeded to push the software to its limits, and beyond, to determine what limitations were to be found in a real working environment. I threw some very strange files at AppleSearch, including Microsoft PowerPoint, Aldus Persuasion, proprietary help files from internal documentation, and every kind of text-derivative file I could find in the suite of software tools at my disposal.
Since the interface was Web browser-based, there was no difference between using the search engine from a Windows PC, a