The government’s use of metadata has been thrust into the spotlight with the NSA’s attempt to thwart terrorism by tracking habits and connections of suspects on watch lists.
But the value of metadata goes beyond the NSA. Metadata is an efficient and cost-effective way to look at unstructured email and documents across data centers, and is often used by financial services and manufacturing companies to control costs, compliance and security matters.
Through similar use of this information, local, state and federal organizations can comply with and reduce costs associated with a number of regulations that would otherwise take too long and be too expensive, not to mention nearly impossible, to comply with.
Unstructured data can be analyzed through aged information lifecycle management tools that were purchased last decade. In addition, a number of unstructured data profiling tools now exist that provide defensibility, full metadata analysis across platforms and optional full-text searches.
Freedom of Information Act (FOIA), state sunshine laws, open/public records laws, the Data Practices Act and similar regulations require agencies to provide information for journalists and the public in a timely, cost-effective manner.
But when the requested documents are on backup tapes, online disks or old servers, it takes a great deal of time and effort to locate these documents. These requested documents, especially ones that have not been accessed in years, are often mixed in with petabytes of other unstructured documents.
The use of metadata with both offline and online data enables organizations to search for the owner of the document or the time frame when this document could have been created.
Indexing tools can also combine metadata queries to limit the responsive data set. For example, if a fiscal report from 2006 would have been created by a certain comptroller in January of 2007 on a Microsoft Excel spreadsheet, the owner, time frame and file type can be searched for, likely on a backup tape.
Using metadata to respond to FOIA requests of aged, misplaced or lost data simplifies processes, reduces labor costs and expedites the process.
Data breaches are a significant concern for private- and public-sector entities. The breach of sensitive data can run the gamut of putting someone at risk for identity theft or the release of Department of Defense documents that were to be sealed.
While protecting data from breaches starts with data center security, encryption, passwords and other security measures, metadata tools and their full-text search capabilities lend an added layer of protection to any organization.
Metadata tools allow agencies to self-audit. Personally Identifiable Information tools find patterns within emails, documents, PDFs and email attachments that mimic Social Security numbers, birth dates and other information that can put people at risk. The information can then be encrypted, purged or archived according to agency policy.
Also, searches can be run by keyword. The word "confidential" or other keywords contained within sensitive documents can be uncovered.
The government has already consolidated about 500 data centers and plans to close 571 more by September 2014. Despite the best of intentions, this process is well behind schedule and has yet to reduce costs.
Many of the time and cost roadblocks of the FDCCI lie in the lack of knowledge when it comes to unstructured data. A 10-year-old data center can have many petabytes of unstructured files and emails on backup tape, servers and even cloud storage. Deciding what gets migrated and what can be purged is nearly impossible and migrating dozens or even hundreds of copies of the same file leaves consolidation fruitless.
Metadata simplifies the process. Metadata analyzes the environment and can find duplicate content, which can easily represent 20 to 40 percent of the entire data center. Beyond that, it can find iTunes libraries and other personal employee-owned content that should not be stored within the data center.
Along with aged emails and files of nonessential employees, legal teams can look at purging at least 50 percent of the data center and only migrating the content that has value.
The federal "Cloud First" policy has mandated that the cloud should be the first choice for all new government IT storage purchases.
The cloud offers low-cost and convenient data hosting compared to implementing and managing enterprise storage in the data center.
Along with new data being stored in the cloud, sensitive and requested content can be located and moved to the cloud based on its metadata: aged data, data owned by ex-employees that may have long-term value, project based content, and more.
Legacy backup tapes are also a good source of cloud content. Legacy tape data can be easily profiled and based on policy, specific content extracted and archived in the cloud. Data profiling and extraction does not require the original backup software and delivers a cost effective way to pull relevant content from legacy tapes.
Metadata provides streamlined support for the government's biggest – and most expensive – projects and can extend itself to e-discovery, management of GIS satellite and cellphone photos, and in-house migration and consolidation projects.
The same metadata indexing tool can even be used across departments to reduce budgets. Metadata is versatile, cost effective and efficient.
Jim McGann is vice president of information management company Index Engines.