Master data management (MDM) is becoming increasingly familiar in the data management field and in sectors of business and government where large, enterprise information systems are becoming more frequent. The number of vendors coming to market with MDM tools has increased sharply in the past couple of years, as has the number of consultants offering professional services around MDM, and the number of articles written about the subject. Codes have meanings, and just like metadata, their semantic content needs to be managed.
With all the buzz, CIOs wonder if MDM is hype, or simply the repackaging of ideas. In reality, MDM builds on a unique perspective of data management that has been ignored in the past with disastrous results. MDM should matter tremendously to large and complex organizations for which data sharing and exchange are necessary.
Importance of MDM
MDM would not be very significant if all data was confined to the operational silos that produced it. However, for many years, data management experts have widely promoted the idea of data as a strategic asset for organizations. Data can be used to increase efficiency and effectiveness, reduce risk, and meet external mandates. To do this, however, data must be freed from the silos where it is created and shared across the enterprise, or even beyond enterprise boundaries.
And technology has accommodated this need to exchange data. The scalability of hardware and software has grown enormously. Specialized tools for moving data, such as extract-transform-and-load tools and middleware products, have been extremely successful. Conceptual frameworks, such as data warehouses and marts, have grown and matured into an array of products and methodologies. All this, however, has been like building a very intricate, expensive plumbing network without worrying too much about what's flowing inside the pipes.
Although statistics are difficult to come by, there's a widespread perception that many data warehouses and marts have failed to meet expectations. Anecdotal information presented at meetings of the Data Management Association indicates that more than 50 percent of such projects are perceived as failures. And a surprising number of the projects get canceled outright.
The problems stem not from technical difficulties or unmanaged expectations, but from complex data quality issues that are discovered only after data is brought together from disparate sources. These problems are severe when they occur in the most shared types of data, or master data.
The situation is hardly better in the domain of transaction exchange and messaging. Straight-through processing (STP) has been a goal of the financial services industry for years. The concept has been to make all the processing involved in the buying and selling of stocks and bonds happen in real time, or at least within 24 hours of a trade. STP involves linking the systems that do the order entry, the matching of buyers and sellers, the delivery of the securities, the accounting and so on. For real-time execution, transactions containing appropriate data must be sent from one system to another.
The entire STP initiative has been greatly hampered by data quality issues, and still remains to be largely implemented. The primary cause, according to a survey by Capco, a consulting firm, has been master data quality problems that lead to high levels of transaction rejections.
Data is the only resource available to an enterprise that can be used without being consumed. The downside is that data with quality problems and a high reuse rate quickly spreads these problems throughout the enterprise, resulting in bad business decisions and inaccurate regulatory reporting.
Master data is the most shared data, and so must have extremely high quality or it will easily negate the huge investment put into data sharing and exchange projects. Worse yet, such problems are only revealed after the investment has been made to do the projects.
Periodic surveys of all the major master data product vendors consistently reveal that the drive to implement MDM is coming from business users and not IT departments. This indicates that business users have identified master data as the primary source of the problems in failed data sharing and exchange projects in their enterprises.
Master Data Defined
Perhaps the most dangerous idea in MDM is that master data is a homogenous set of data to which one-size-fits-all management techniques can be applied. If we look closely at master data, we see it consists of separate subclasses, each with its own special properties and behaviors, and thus unique management needs.
We can separate an enterprise's data resource into different layers if we consider the data from the viewpoint of supporting transactions in operational systems. If we look at all of the databases in an enterprise, it's possible to discern a pattern in the different types of data tables they contain. At the top of this hierarchy is the metadata layer containing the definitions of database tables and columns. This metadata should stay unchanged for the life span of the database it describes. Any data quality problem, such as a column whose size is too small, can have a tremendous impact.
Below metadata we have reference data. This is also known as code tables, domain values or valid values, and it is an important subclass of master data. Examples include country, currency, product type and customer credit status. These database tables typically consist of a code column and a description column, and usually only a few records. Reference data tables are the Rodney Dangerfield of the data world. They get no respect. They are thought of as being small, simple and stagnant. Yet they are very important, and can make up anywhere from 20 percent to 50 percent of the number of tables in a database.
Reference data can be defined as follows:
Any kind of data that is used solely to categorize other data found in a database, or for relating data in a database to information beyond the enterprise boundaries.
Some unique properties and behaviors of reference data that set it apart from other classes of data:
Codes drive business rules. If a business rule includes a data value found in a database, this will almost certainly be a code value from a reference data table.
Reference data tables must be fully and accurately populated when an application is being developed -- long before it goes live.
Next in the data hierarchy comes transaction structure data. This includes the two old favorites "product" and "customer." Transaction structure data tables define the parties to the transactions that an enterprise processes in its operational systems. For instance, if I buy a book online, the product information for the book has to be present, as do my customer details. Transaction structure data is another subclass of master data, but it is quite different to the reference data tables discussed above. It can be defined as follows:
Transaction structure data represents the direct participants in a transaction that must be present before a transaction fires.
Some of its unique properties and behaviors include:
Transaction structure data tables usually contain more data elements (fields or columns) than other tables in a database.
Many of the data elements in transaction structure data tables have complex relationships. For instance, if product type is "domestic appliance," then "working voltage" must be populated; otherwise "working voltage" must be blank.
Transaction structure data tables must be populated with information before the transactions they support can be fired.
The final subclass of master data are enterprise structure data. Although these tables are not present in every database, they are important and can be defined as follows:
Enterprise structure data is data that permits business activity to be reported or analyzed by business responsibility.
Examples include: "chart of accounts" and "organization structure." Unique properties and behaviors of this subclass of master data include:
Enterprise structure data is typically very hierarchical.
Enterprise structure data evolves over time, presenting challenges for the reporting of historical business activity in the current structure.
Below the three levels of master data are transaction activity data, the traditional focus of information technology that represents the actual transactions that flow through operational systems, and transaction audit data, which tracks the progress of each transaction as it goes from initiation to termination.
Beyond Master Data
Without a clear understanding of master data, danger lurks. One problem is expanding the scope in any MDM project. The inability to set clearly defined boundaries for master data can lead to expectations that a far greater set of data will be tamed by an MDM project than is practical. Stakeholders across an enterprise may genuinely believe that many of their data problems will be resolved via MDM when this is simply impossible.
Another danger that can arise from an unrealistic view of master data is that there is one set of management techniques that apply to master data. Some vendors say MDM is nothing more than time-honored basic data management techniques put into a new format. This is simply not the case. The need to manage semantic content of codes in reference data tables is critical. It requires gathering precise definitions of values and implementing a knowledge management infrastructure that will permit anyone in an enterprise to access these definitions.
Master data has now been recognized as critical to data sharing and exchange in enterprises. Yet it is not a homogenous class of data but consists of several different subclasses. The need for MDM to enable sharing of high-quality master data is more urgent than ever. However, unless attention is paid to the different management needs of the different kinds of master data, generalized attempts to implement MDM may be doomed from the start.