Government Technology

A Primer on Big Data in State and Local Government



April 30, 2012 By

Editor's Note: Kapil Bakshi works for Cisco Public Sector, leading and setting strategic direction of solution spaces like Big Data and cloud computing.

Data is a critical asset for state and local government, and it has been for decades. The questions today are what happens when you have too much data, and how do you make sense of it when, according McKinsey Global Institute, data volume is growing 40 percent per year? How can you keep up with that much data?  It’s important in today’s digital age to not only store large data sets, but also use the data to make mission-critical decisions. This emerging field of data analytics is being called Big Data.

An enormous amount of data is generated as a result of factors such as:

mobility: mobile devices, mobile events, and sensory integration;

data access and consumption: Internet, sensors/actuates, interconnected systems, social networking, convergent interfaces and access models (Internet, search and social networking, and messaging); and

information model and open source: major changes in the information processing model and the availability of an open source framework.

State and local governments must also look at the type and source of data being collected, stored, analyzed, and consumed; that is, structured versus unstructured data. Unstructured data is information that either doesn’t have a predefined data model or doesn’t fit well into relational tables — examples are text, log files, video, audio, and network-type data sets. Structured data is the data that has been modeled and normalized to fit into a relational model, such as traditional row/column databases. Big Data is a compilation of both structured and unstructured data, typically including the following data sources:

traditional enterprise data: enterprise information data stores — from customer relationship management (CRM) and enterprise resource planning (ERP) systems to payroll and Web store transactions;

machine-generated or sensor data: scientific data, telemetry data, smart meters, network sensors, call/event detail records, weblogs, equipment logs and trading systems data; and

social web data: customer feedback streams, microblogging sites like Twitter and social media platforms such as Facebook

Three key characteristics define Big Data. They are known as the three “Vs”:

volume: machine-generated data is produced in much larger quantities than nontraditional data;

velocity: how quickly data moves across an enterprise; and

variety: not just relational data stores, but also the unstructured data in enterprise

In addition to cost and complexity, the requirements of traditional enterprise data models for application, database, and storage resources have grown over the years. This rapid change has prompted a shift in the fundamental models that describe the way that Big Data is stored, analyzed and consumed. The new models are built upon scaled-out, “shared-nothing” architecture, which is bringing new challenges to governments that are deciding what new technologies to use, and where and how to use them. To manage this shift, two building blocks are being added to the enterprise technology stack to accommodate Big Data:

Hadoop: provides storage capability through a distributed, shared-nothing file system and analysis capability through MapReduce; and

NoSQL: provides the capability in real time to capture, read and update the large influx of unstructured data and data without schemas.

Given these details, what can state and local governments do to prepare and embrace Big Data? First, state and local governments should try to get ahead of their data deluge. Strategy and planning is critical to this process.  Second, they must develop and review the life cycle for Big Data in their enterprises. The life cycle can be categorized into the following phases:

capture: the collection of data from a diverse set of sources, as described previously;

store: the repository for the collected data — the right kind of data needs to be stored in the correct repository;

analyze: the analytics of the data in the repositories; and

consume: the reporting and business intelligence for decision-making.

When the Big Data life cycle is well understood, then plan and identify the following:

Find technology enablers: These could be new infrastructure, software applications evaluation and pilots.

Adopt an ecosystems approach: Big Data is a new and emerging space, and there will be several upcoming technology options to review and select.

Adopt a use case-based approach: Data’s value depends on the insight of the domain. Hence, look for use case-specific projects — for example, use cases of network-centric Big Data analytics or cybersecurity and video-based insights.

Invest in data-centric skill sets: The insights in these large data sets is as good as the domain knowledge of the data. Therefore, skills for data analysts and scientists need to be developed and nurtured.

Kapil Bakshi is a native of the Washington, D.C., area, and holds bachelor’s degrees in electrical engineering and computer science from the University of Maryland, College Park; a master’s degree in computer engineering from Johns Hopkins University and an MBA from the University of Maryland, College Park. Bakshi has held several positions within the IT industry, including at Cisco, Sun Microsystems and Hewlett-Packard.


You may use or reference this story with attribution and a link to
http://www.govtech.com/e-government/Primer-Big-Data-State-Local-Government.html


| More

Comments

Tom Tomlin    |    Commented May 2, 2012

Kapil Don’t disagree with a word you say but would point out that the issue of effective utilisation of ‘big data’ has only just started to be done by organisations seeking to make a profit (and therefore have a very solid business case for it) from it. Outside of the defence ‘function’ as part of the role of Government, I haven’t seen more than 3 or 4 examples actually get off the ground. This for me is beyond frustrating as I don’t believe there is an (reasonably) IT enabled Government anywhere in the world that does not have enough data collected and stored to radically change the way it formulates policy. Having a truck load of data on absolutely anything allows anybody to start the move into the area of evidence based policy making and away from the squeaky wheel of political or civil servant based self-interest. My question is in the context of Government (and I don’t think it’s just at the state and local level either here in Australia or in the US), what will drive Government to even seek to utilise the first 10% of the data they already hold? As when VMware first promised to increase processing capacity on a server from 9% to 45%, what are the hot buttons for Government to start to utilise big data? I have worked on two locally (distributed learning across 2800 schools and transport planning for 600M journeys across 3 modes of public transport) and while presenting evidence of benefits ranging from cost to service delivery capacity, have not found anybody, elected or otherwise, that was not frightened rigid by the prospect that they could actually base their policy formulation on what is evidenced to be appropriate and actually needed. Be very interested to hear from you (or anybody!) that might have any answers.

Larry    |    Commented May 10, 2012

Agree with Tom Tomlin. If there is no one interested in the 'consume' phase, why make a big investment? An evidence-based decision making culture must precede investments in big data management. Otherwise lots of tax payers dollars will be spent on useful data that is - never used.


Add Your Comment

You are solely responsible for the content of your comments. We reserve the right to remove comments that are considered profane, vulgar, obscene, factually inaccurate, off-topic, or considered a personal attack.

Related To This Story


Real Impact for Lean Government