The big data marketplace is evolving rapidly. Here’s a look at some of the players.
Big data. The term is buzzing through the industry like wildfire, with vendors popping up in droves claiming their latest solution will help an agency increase efficiency. The concept seems simple enough — technology grabs data sets from a variety of systems and kicks back usage trends and other patterns that government leaders can use to help make better decisions.
But with so many products in the marketplace, getting a handle on all the options available can be a headache for even the savviest CIO. To help cut through the confusion, Government Technology took a look at 10 big data solution providers. Whether your data resides on an open source framework such as Apache Hadoop or a proprietary database system, the following list — presented in alphabetical order — offers a snapshot of the types of big data storage and analytical technology available today.
What IT does: CommVault’s Simpana Platform lets users analyze, back up, recover, replicate, archive and search data across their enterprise and across any storage device, according to the company. The platform includes Simpana OnePass, which integrates archiving, backup and reporting into a single process to eliminate operational complexity and reduce cost. The products are designed to work on large-scale petabyte-level file systems and Microsoft Exchange messaging environments.
How it’s different: Emily Wojcik, CommVault’s senior manager of product marketing, said the technology reduces scan times because backup, archiving and reporting are performed as a single operation, improving efficiency. “What makes it very different from other vendors out there is that we’ve built this whole platform for data and information from the ground up,” Wojcik added. “No acquisitions.”
Reference customer: The Afognak Native Corp., a quasi-public organization formed in 1971 to conduct business on behalf of Alaska’s indigenous people, is a Simpana user. The corporation needed simpler, faster searching and retrieval of data to meet legal discovery requests. In addition, it wanted to improve its disaster recovery.
Like this story? If so, subscribe to Government Technology's daily newsletter.
Using Simpana 9 software from CommVault, the corporation can now recover backed-up data within 75 minutes, and has an improved e-discovery system that integrates with Microsoft’s Windows Azure cloud platform.
What it does: EMC bills its Isilon platform as highly scalable storage for the big data era. Isilon is a storage and management solution for file-based, unstructured data such as audio content, video footage, large home directories, massive log files and analytical data in general. Capacity can expand from a few terabytes to 20 petabytes depending on need, the company says.
How it’s different: Audie Hittle, federal CTO of EMC Isilon, said scalability is Isilon’s calling card for public-sector customers. Hittle said Isilon’s architecture lets customers add capacity without rebuilding or replacing systems.
Reference customer: Although Hittle couldn’t name Isilon’s premier public-sector customer, he described it as an “intelligence wing of a federal agency.” He says the organization used Isilon to consolidate data storage equipment from 19 racks to three, and reduce the need for support staff.
What it does: IBM’s Smarter Planet initiative offers an array of big data solutions for public safety, transportation, social services programs, tax and revenue, and education. These products often include advanced case management and predictive analytics modeling capabilities.
How it’s different: IBM takes a program-specific approach to big data in the public sector. For example, the company works with state unemployment insurance programs to improve claims handling and automate processing of routine claims, leaving only complex matters for case adjudicators.
The company is also focused on how big data can be applied in K-12 education.
“We’re involved in a number of initiatives that really are taking advantage of the explosion of digital content and its relationship to the classroom,” said Gregory Greben, vice president of public-sector business analytics and optimization practice at IBM Global Business Services.
Reference customer: Police in Fort Lauderdale, Fla., use IBM technology to mash together traditional criminal justice data and information from other city departments to gain new insights on criminal activity. New analysis tools will let the city police department comb through traffic and transportation information, building permits and social media activity in addition to standard criminal justice databases. Correlating these diverse data sets could help the department anticipate where crimes will occur and put cops in the right places to stop them.
What it does: Informatica says its PowerCenter Enterprise product provides a platform for data integration initiatives like data governance, data migration and enterprise data warehousing. It scales to support large volumes of disparate data sources, the company says, turning raw data into actionable information.
How it’s different: PowerCenter cuts development and deployment time by letting users integrate their own data in a shared graphical environment. The product is designed to help organizations take advantage of big data without requiring knowledge of specialized programming languages or frameworks, the company says.
“The data scientist can spend more time doing analytics and science, and they can turn over the more mundane tasks of pipelining the data in … to somebody who knows data, but doesn’t necessarily know Hadoop,” said Todd Goldman, the company’s vice president and general manager of enterprise data integration and data quality. PowerCenter also can automatically clean up “dirty” data produced by RFID sensors and other sources, he added.
Reference customers: Informatica works with a variety of public-sector agencies, most notably the state of Colorado and the IRS. The IRS uses Informatica software to convert data from multiple legacy formats into useful information. Colorado is analyzing student data and human services information to predict student success, and to direct students to appropriate support programs.
What it does: Oracle’s Big Data Appliance is an integrated hardware and software solution for managing and analyzing large-scale data sets.
How it’s different: Oracle’s engineered systems approach offers best-of-breed hardware and software components that are engineered and tested to work together out of the box, the company says. These pre-assembled solutions are designed to be more efficient and easier to deploy.
Mark A. Johnson, director of engineered systems for Oracle Public Sector, added that the company’s Big Data Appliance can tie existing data together without a steep learning curve. For instance, it can take old SQL databases and combine them with new technologies such as Hadoop and NoSQL, allowing users to access new data capabilities through a familiar interface.
Reference customer: The National Cancer Institute (NCI) in the U.S. Department of Health and Human Services needed to search an unstructured data set of 22 million medical abstracts to correlate research studies of a particular genotype that figures prominently in certain cancers. Oracle’s Big Data Appliance, built for Hadoop analysis, ran the query in three days, Johnson said, after the NCI’s staff had spent weeks trying to analyze the data on its own.
What it does: Platfora Big Data Analytics is software that processes data in Hadoop and gives a visual overview of analytics from events, actions and behaviors.
Why it’s different: Agencies can use Platfora software to illustrate relatively simple findingslike the number of clicks a website receives by region, but the company is focused on drilling deeper into data, said CEO Ben Werther.
The idea is to look at patterns of behavior across different streams of activity, Werther said, which could be particularly helpful for intrusion detection and other cybersecurity tasks. The software lets users observe net-flow and packet-capture data to spot suspicious activities. “We think that is meaningful — and business users, analysts and regular people can engage in a way that doesn’t require everything to be a statistics problem, which it isn’t,” Werther said.
What it does: SAS offers a range of big data capabilities, but fraud detection is a big emphasis for the company. The SAS Fraud Framework helps agencies detect fraud, waste and abuse. And the company’s Visual Analytics solution is used by agencies to forecast demand for government services like Medicaid.
Why it’s different: SAS software lets users access and analyze data from any type of source, said Paula Henderson, vice president of the company’s state and local government practice.
For instance, SAS Fraud Framework includes an enterprise data management function that collects information from a variety of sources, cleans it up and analyzes it using SAS analytics technology.
Reference customer: North Carolina used SAS technology to create the Criminal Justice Law Enforcement Automated Data Services, a platform that contains data about gun ownership, traffic violations, driving records and other information. Police officers can access the data on the Web, giving them better information during traffic stops and other encounters.
What it does: The company specializes in capturing and analyzing machine data — information generated by systems themselves. Splunk Enterprise software monitors and evaluates this data, giving agencies new insight into user behavior, system performance and cyberattacks.
How it’s different: Splunk software analyzes data produced by applications, servers, network devices, security devices and remote infrastructure and presents results in a visual format that’s easy to understand. Splunk also offers a virtual store containing 400 downloadable apps for viewing data from various sources.
Reference customer: One of the company’s latest public-sector customers is the Texas Health and Human Services Commission. The agency uses Splunk software to analyze more than a terabyte of information daily, said Bill Cull, the company’s vice president of public sector. The commission has 200 Splunk users spread throughout the organization, ranging from security and application teams to the deputy commissioner.
“We essentially provide insight into everything from servers, routers, switches, networks and all the way up to what we term operational intelligence where a government customer [is] able to see unprecedented insight into not only errors in the application, but also traffic and usage statistics,” Cull said.
What it does: Teradata provides cyberdefense analytic solutions; prevents fraud, waste and abuse; and improves government/citizen interactions.
Why it’s different: Teradata offers a revenue share model for government agencies lacking the resources to invest in analytic solutions. The company will deploy an entire hardware and software solution for fraud prevention or similar function and finance the system by taking a percentage of revenue recovered by the new technology, according to Bobby Caudill, Teradata’s program director for government.
Reference customer: Caudill said 16 states use Teradata technology, including Michigan, which has been working with Teradata since the mid-1990s. In early 2012, the state estimated it was realizing an ROI of approximately $1 million per day using the company’s analytics technology, he said.
What it does: Unisys offers what it calls “Big Data Analytics as a Service,” providing data scientists and data analytics environments on public and private clouds.
How it’s different: The company reduces big data complexity by letting agencies outsource both the technology and expertise needed to take advantage of sophisticated analytics, says Rod Fontecilla, vice president of application modernization. Customers send Unisys their data and the company provides business insights and predictive models based on the information.
“People have realized that buying products doesn’t solve your data analytics problems,” Fontecilla said. “And getting a data scientist that really understands how to build predictive models is not easy — they’re very hard to find and train. So the ability to have all that in one place, makes us kind of unique in the marketplace.”