Big data challenges are certainly not limited to the government, but they are exaggerated due to the massive amounts of data that agencies at all levels of government currently must collect and store to accomplish their missions.
We are entering the decade of data. Governments daily are collecting tons of data from a wide variety of sources. Much of the data is pigeon-holed into stovepipes of programs that asked for the data and use it for, we hope, utilitarian purpose. However, this data could be put to better use by making it available to a wider spectrum of agencies and the public.
Shaun Bierweiler is president at Hortonworks Federal, and vice president of U.S. Public Sector at Hortonworks. He has more than 15 years of experience helping the public sector navigate the intersection of technology and business. He has spent most of the last decade helping agencies leverage big data and enterprise open source solutions to accomplish critical missions. He is a graduate of the University of Florida, and the University of Maryland's Robert H. Smith School of Business.
Bierweiler responded to a series of questions about how data could be put to better use by opening up the availability of the data.
What are the challenges that come from trying to unleash all the data that governments have stored, that is currently sitting idle and unused?
“Big data” challenges are certainly not limited to the government, but they are exaggerated due to the massive amounts of data that agencies at all levels of government currently must collect and store to accomplish their missions.
Many of the challenges stem from the variety of data formats and the velocity at which the data is being generated. Difficulties arise not just with querying across data sets, but also with the ability to discover and access a particular piece of information in a timely manner.
Adding to the growing data problem are the limitations of the legacy systems many government systems rely on. In many cases, these systems were built for structured data formats, specific use cases, and the assumption of disparate data silos. Government agencies are relying on legacy systems that include data being stored in disparate silos, plaguing them with the inability to access and gain insights across their various data sets.
These challenges have a significant impact on an agency’s ability to run today’s models and algorithms in order to gain the insights needed and expected to prepare for impending and to manage current emergency events.
When it comes to emergency management, government agencies have a colossal amount of data ready to be harnessed. Agencies need to be able to gather insights not just from weather events, but also from a diverse range of technologies such as sensors, geolocation events, photographs and social media.
Because there is a gargantuan amount of historical data that is constantly growing in real time, it’s imperative that agencies are able to access all of their data to derive insights that play a direct role in mitigating life-or-death situations. The ability to map all data assets together becomes crucial.
What tools are available to access this stored data, understand it and put it in a usable format?
The open source community provides an expansive group of companies and developers that are contributing to open source projects that are addressing the tools, interfaces and capabilities to make this data not only accessible, but also powerful. Companies, like Hortonworks, can build upon the innovation of the open source community and make it enterprise-ready and safe for government missions.
To address the challenges described above, enterprise open source providers are developing tools like Hadoop to store massive amounts of data and gain actionable intelligence from that data in a quick, seamless manner. By harnessing the benefits of the open source community, providers can tap into a partner network that enables the visualization of data and provides further insights and analytics.
Other tools like Apache NiFi enable data in motion and allow agencies to move edge sensor data into their environments for processing. This can be an important tool when it comes to emergency management and response, as it enables agencies to automate the movement of data from disparate data sources, like weather monitoring devices, cameras, sensors and more.
The beauty of using an enterprise open source platform in a disaster response setting is that government agencies are supported throughout the entirety of the emergency management life cycle. An enterprise open source solution makes it possible to respond to and recover from disasters in several ways, including using historical data to develop more efficient evacuation strategies and tracking weather events in real time — ahead of any unanticipated changes in a storm's pattern.
Has cloud data storage made this task easier or harder?
Simply put, cloud data storage has done both. On one hand, the elasticity of the cloud has made it easier to store data, but on the other, it has further compounded the challenge of making data accessible. Without a comprehensive data architecture and data strategy, agencies will find it even more difficult to access their data and reap the benefits of security and critical analytics.
However, with the right tools, cloud storage easily becomes one of the most beneficial assets for government agencies. In the early onset of cloud computing, storage was often expensive and out of reach for government agencies — especially at the state and local levels. Since then, cloud storage has become much more affordable and quicker to deploy, but having the right technology to enable access and utilization of that data can still appear out of reach.
With Hortonworks Data Platform agencies can gain specific access to their data — whether it is deployed on the cloud, on-prem, or both — and can obtain comprehensive and complete insights into their data, thereby addressing the challenges associated with accessing and empowering the government’s data.
There are other opportunities to gather real-time data from cameras, sensors, social media and the like. How can that be done without violating the privacy of individuals?
This is more an issue of policy as opposed to a solely technical challenge. However, once an agency has a policy set in place, it must have the right technology to implement it smartly and securely. To do so, it needs to have governance wrapped around each solution to ensure that each implementation is effective and auditable.
At the end of the day, privacy is a question of security, governance and policy. From a technical perspective, it’s important to have the architecture and solution in place that can support the policy from an automation, security and governance perspective. This is key to advancing the capabilities of real-time data analysis whether the deployment is on-prem or in the cloud.
How is all this data going to be used from an artificial intelligence perspective?
Having all the data on a common set of tools in a centralized location makes your algorithms and automated capabilities even more powerful. Recent research has shown that giving simple algorithms access to more data has a catalyst effect on what your algorithms are capable of achieving. By leveraging Hadoop and a cloud-native solution, data science teams can maximize their efforts against larger sets of data. These larger data sets will drastically improve the accuracy of the organization’s algorithms.
Security of all this information should be an issue. What cybersecurity approaches work best to protect the data when it is used?
I would break this into two parts. First, agencies must protect the data in its use. You have to have a consistent implementation of security. This comes from a consistent implementation of perimeter security, governance and authorization controls to protect your data once it’s centralized into a storage and compute solution, like Hortonworks Data Platform.
Second, an agency needs a holistic cybersecurity perspective for overall protection. The same tools that allow agencies to analyze and leverage your data science can be used to protect against cyberattacks. For example, Hortonworks Cybersecurity Platform uses all of the data sources that are generating network activity within and around your network to gain valuable insights from machine learning and ultimately understand what the nefarious actors are already doing on your network.