The world is overwhelmed by data — and the prospects are for more than we can drink in for as far as we can see. Studies by IBM and Cisco have concluded that 90 percent of the data in the world today has been acquired in the past 18 months, and that we will double this vast store every 18 months for the foreseeable future. These same studies calculate that we generate 2.5 quintillion bytes of data each day.
A major source of the increasing store of data comes from the Internet, including social media, online shopping, and the posting of incredible amounts of detail on stories and everything from movies to books to newspapers to you name it.
As Internet use grew, companies that were growing rapidly and supporting searches across the data repositories on the Internet, as well as online personal consumption records captured in the course of doing online business, realized that there was a massive amount of data being captured about people. If they had a way to analyze and sort through all of this data, then they would be able to introduce a new model where personalized interaction was possible as long as there was enough detail data to create individual preferences.
This commercial drive led to the creation of what is now called big data. It is based on the use of a new approach to analysis using highly sophisticated models and a new distributed file system called Hadoop designed to break processing and computation down across hundreds or thousands of individual computers. The low end of the Hadoop processing world covers hundreds of gigabytes.
The use of the Hadoop framework is tied to the use of a new generation of analysis tools that work in this framework. The result of all this is to allow detailed analytical work to be done on data repositories far faster than the analytical tools that were limited to extracting data from relational databases.
The McKinsey Global Institute (MGI), a very respected thought leader in international circles, has finished a report describing big data as “The next frontier for innovation, competition, and productivity,” speculating that all economic sectors will have monumental impacts from the use of big data. In particular, MGI expressed confidence that the government sector will have a distinct and major impact.
We still have a lot to figure out about this new way of thinking and working, and some experts have suggested the major IT companies are moving too fast to declare that they are now big data companies. Just about every major IT firm is either already offering connections to Hadoop or some kind of parallel processing analysis engine. Big data is definitely here and in use in the commercial world.
Recently, the National Association of State Chief Information Officers (NASCIO) issued a report urging a note of caution in getting too excited about big data until some of the basic issues could be resolved, such as: creating a clear and common definition for what is big data; developing a clear business case; and, creating a framework for governance of big data.
But this train has left the station, and industry, as well as many government agencies, has recognized the enormous potential of big data in improving decision making at all levels of government. Rutrell Yasin writes in Government Computer News about “the Big Data Commission of 22 experts and academics who will provide guidance to government and business on how the large troves of data being collected today can drive U.S. innovation and competitiveness.”
Big data is about using modern analytics to find patterns, increase knowledge, make predictions, and inform executives and line management so that better and more useful outcomes can result. Viewed in this light, this is exactly what many progressive thinkers are trying to do in public safety and justice — seeking new practices and tools, such as the use of big data and predictive analytics, to inform decision making. With this frame of reference, we should surely explore the potential of big data in public safety and justice.
There are two key parts of this assessment and evaluation: (1) to determine what data can be brought to bear on the prevention of crime and terrorism, and how the various data sources can be considered as one for the purposes of treating it all as big data; and, (2) getting the full measure of value from this new approach to improving performance and effectiveness. There is a lot of study, thinking and experimentation needed to make big data directly supportive of the missions involved in public safety and justice.
This blog originally appeared on the Integrated Justice Information Systems Institute website. It is reprinted with permission from the author.
Paul Wormeli is executive director emeritus of the Integrated Justice Information Systems Institute, a nonprofit corporation formed to help state and local governments develop ways to share information among the disciplines engaged in homeland security, justice and public safety.