Many pleasant words are used to describe information infrastructures -- "heterogeneous environment," "mixed platforms," or "multiple architectures." But to be honest, when information is needed from several offices or departments, most infrastructures are best described as a "mess."
Legacy systems here, midrange boxes there and PCs everywhere. Even putting all the hardware on the same network doesn't get the information out and into a unified view. Some of the information is available through distributed file systems; some is obtained from homegrown scripts; then there's the information that's collected by e-mailing Betty in bookkeeping and asking her to "please print the monthly report -- you know the one you do every month at this time -- and e-mail it back to me."
Client/server technology was supposed to handle the problem, but it created another system. The whole structure should be redesigned from the bottom up or top down, but who has the time or money to step back and do it? Even worse, technology is changing so fast that planning often seems outdated before it's approved, let alone put into action.
This may be overstating the problem, but it's not uncommon to find information infrastructures with many data sources, each of which contains a piece of the puzzle needed to put together a complete picture. The problem for IT is to get at the data, translate it into a common format and present it in a coordinated, timely manner. Datawarehousing has been one solution, but datawarehouses also have disadvantages.
According to Kevin Strange of the GartnerGroup, datawarehouses require tremendous upfront planning, cannot be expected to show results for seven to 12 months and may cost upwards of $10 million dollars (see Government Technology, Emerging Technology Handbook, June 1996). What's more, new data sources are coming online continuously, whether it's the new accounting package in the billing department or another agency's Web site. Adding a data source to a datawarehouse can take time.
Because datawarehouses work by taking "snapshots" of existing data and replicating it into a central repository, their internal structures are often fixed. They need to be designed in great detail before implementation because redesigning and re-populating a datawarehouse after it's been filled with several years of data can be very time-consuming and expensive.
Enterworks Inc. is taking a new approach with their Virtual DB product. Instead of replicating existing data into a datawarehouse, Virtual DB builds a "virtual database," using existing sources of information as building blocks.
"It [Virtual DB] started in the intelligence community for the National Security Agency, whose information infrastructure is not unlike everyone else's out there," said Bob Lewis, president of Enterworks. "They had a whole variety of different systems and they wanted to unite that information. We built Virtual DB with them in mind, in response to their initial requirement, but it was funded by Telos."
The first step in building a virtual database is to tell Virtual DB where to find data sources. It goes out across the Net and maps the existing systems, gathering information on the tables, fields and permissions that exist in other running databases such as Oracle, Sybase and Informix, or in text-based files. It summarizes the types of information in these disparate sources in a "meta-catalogue" which is just a map of what data exists and how to get at it. For example, the meta-catalogue might show that the city clerk's Sybase database contains fields for first and last name, address and political party registration. It would also show that the property tax office's Ingres database has fields for first and last name, plot number, year purchased, assessed value and outstanding tax balance. When the database administrator looks at the meta-catalogue after this data-gathering phase, he sees a map of "what's out there." This by itself can be very helpful, but