Editor’s note: The following is the first installment of the Digital Communities special section in the September issue of Government Technology magazine.
Attack of the Petabytes
Imagine a completely Internet-connected world where humans are deluged with data and starved for information, where more than 70 percent of all email is spam, where cars, electrical meters, phones, computers, televisions, refrigerators, eyeglasses, traffic lights, sewers and even clothing are connected and busily sending and receiving data at the speed of light — or in some cases, at the speed of dialup. A world where social media information explodes, where national governments can scoop up every phone conversation, email, text message or photo, browse through social media, block access to “undesirable” information and attack critical infrastructure with incalculable sophistication. It doesn’t take much imagination to conceive of such a place; it’s happening today.
Can cities and counties survive the attack of the petabytes (1,000 terabytes), let alone find the exact information that can help make life better? Something big this way comes, and now nearly everyone can hear the thump of its heavy tread. In this time and in this place, analytics may be mankind’s last hope.
All this data running rampant, if managed for good instead of evil, has the possibility of benefiting us humans in our pursuit of a better life. Yes, if Edward Snowden is telling the truth, every telephone call, email, text message and social media post can be scooped up by the National Security Agency and neatly archived. In fact, the NSA is building a $1.2 billion intelligence-gathering facility in Utah capable of collecting and storing zettabytes of data on everything moving through the air, wire or Internet. On the other hand, NSA Director Keith Alexander says that this mass surveillance has stopped some 50 terrorist attacks. Striking a balance between beneficial surveillance and privacy-robbing snooping figures to be the subject of a continuing national debate.
But because of its potential for good, big data — as this subject is called — has become more and more interesting to state and local governments. It creates a giant pot of data that can be sifted through to find things like leaking water or gas pipes, bird flu outbreaks, tax dodgers, sex offenders and dangerous intersections. Even more interesting is big data’s potential to be predictive — spotting bridges that might collapse, determining which inmates probably won’t reoffend if given early release, plotting where police should go to prevent homicides.
Collecting data isn’t enough, of course, it must be examined, and sifting through these mountains of information manually isn’t an option — there just aren’t enough people around to do that. As computing power improves and analytics software becomes more sophisticated, these tools are being used to scour data for useful patterns, hidden correlations and other insights to improve decision-making.
“The goal of analytics is not to have the decision disappear inside the computer,” said Katharine Frase, IBM public-sector vice president and CTO, “the intention is to enable humans to make decisions with better evidence. In the case of the Watson system that played Jeopardy, you ask the system a question after it’s been trained and it recommends some possible answers, with the evidence behind why it’s making that recommendation.”
To illustrate, Frase cited a water project in Dubuque, Iowa. The city was installing smart water meters and wanted to know the most effective way to present meter data to ratepayers. Essentially city leaders needed to know what type of data would prompt citizens to take action.
They assumed that some water customers would immediately search for a leak if their meters showed water flowing at 2 a.m. But many others wouldn’t bother. Cost-sensitive customers would change habits to reduce their water bills if rates varied by time of day. But how large is that population? Ultimately one tactic that worked well was showing residents how their water use compared to their neighbors. It was somewhat competitive, and it got many residents engaged in conservation, resulting in an estimated 7 percent reduction in water usage.
Dubuque applied the same methodology to electric service — on the principle that “insight into usage patterns can provide the basis for more intelligent electricity consumption decisions,” according to an IBM report.
The energy pilot gathered usage information through smart meters in 1,000 households and applied it to analytical algorithms running in the cloud, said the report. Residents could view information on the best way to minimize consumption during peak usage periods. Social networking helped residents compare power consumption patterns, and households employing the solution cut their electricity use by as much as 11 percent.
So when people talk about analytics, Frase said, they are talking about analyzing multiple forms of data to be able to predict what will happen and to prescribe the best response. “At the end of the day,” she said, “it’s humans in the city that actually take action, so how do you engage them in that activity?”
New York City Gets Proactive
You have millions of trees that could drop broken limbs on the public. You have plugged sewers because some restaurants dump grease down the storm drain or the toilet. You have illegal construction creating fire hazards. So who do you call? If you’re New York City Mayor Michael Bloomberg, you call Michael P. Flowers.
Flowers, a number cruncher in the NYC Office of Policy and Strategic Planning, is no ordinary geek. On a trip to Afghanistan, he saw how the military used analytics to predict where improvised explosive devices were likeliest to be planted. The experience prompted Flowers to start using analytics in the city. Last September he was named a White House Champion of Change for using analytics to help New York City tackle some tough problems.
Where to Begin?
Most people think you fight fire with water, ladders and axes, and that’s true. But what if you could stop fires before they ignite? That’s the idea behind building inspections. But New York City has 20,000 complaints each year about illegal conversions of buildings — jammed with partitions, hotplates, extension cords and extra families — and only 200 inspectors. Each illegal conversion is a potential catastrophe, and illegal conversions are a large part of the 2,000 serious fires per year in buildings that house one to three families. How could the city prioritize the complaints so that inspectors would visit the worst buildings first?
Flowers and his team explained the process in a department video. They started with a spreadsheet and began adding data from different city departments. The city worked with all kinds of data to begin with, but four types of data were key, said Benjamin Dean, chief analyst in the analytics unit: unpaid taxes, an owner undergoing foreclosure procedures, buildings constructed before the building code revisions of 1938, and the neighborhood’s socio-economic status.
In the video, Deputy Assistant Chief Joseph Woznica of the NYC Fire Department said at first he didn’t think it would work — there were too many turf issues and a lack of cooperation among departments to merge all this data. But those obstacles were overcome and the information was combined, enabling high-risk buildings to be targeted first.
A 13 percent vacate rate — meaning that the buildings inspected were so dangerous that they were unfit for human habitation — shot up to 70 percent when prioritized with analytics, a huge jump that astonished everyone, including Flowers. The result is that New York City can now target its limited resources where the problems are likeliest to be found.
That approach was applied in other areas too.
To locate restaurants pouring cooking fat into the sewers, the city correlated restaurant locations, a map of the sewer system, reports of calls for sewer clogs, and a list of restaurants with no contracts with companies that haul away waste cooking oil. That narrowed the focus down to a few suspects that could be targeted.
To locate areas where trees were likely to drop limbs on the heads of passers-by, the city correlated data on what trees had been trimmed and when, with data on calls to remove fallen limbs and uprooted trees. The results were used to target for trimming those areas with trees most likely to drop limbs. The data also revealed that trimming trees one year reduced calls the following year by more than 20 percent.
Besides turf issues, another challenge was data format. It’s impractical to map millions of individual trees. But the city did have records of tree trimming and cleanup of downed limbs and debris, as well as reports of injuries. However, the records came from different departments — and where one department may have noted location by street address, another may have used city block designations. To combine big data into a useful analytics engine, the data formats must first be reconciled. New York City settled on city blocks as the designated location standard.
The city has a state-of-the-art big data and analytics methodology, and media reports of those successes have helped other cities incorporate analytics into improving safety, providing better services to the public and focusing on those things that give the most bang for the buck.
Getting Ahead of Criminals
The movie Minority Report — in which crimes were predicted and individuals arrested before they could offend — was science fiction. But CompStat is a method for predicting crime that really works. And it started in New York City.
Back in the early 1990s, a New York City subway cop named Jack Maple began to map where crime occurred in the subway by which stops, what time of day, etc. These maps, which he called “charts of the future” were written in crayon on 55 feet of butcher paper. The charts helped predict where and when crimes were likely to occur, so officers could be assigned accordingly. Between 1990 and 1992, they helped cut subway felonies and robberies by nearly one-third. NYPD Commissioner William Bratton later incorporated the system into all NYPD operations and today, police departments around the world use that system or some variation of it.
And while “charts of the future” use historical data to predict where and when crimes will occur with a great deal of success, gunshot detection systems bring analytics into the present, up to the second. The systems, in use in many metropolitan areas, locate a sound by triangulation, determine if the sound is actually a gunshot, and if so, provide GPS coordinates or an address, tally the number of shots, the direction the muzzle was pointing, if the shooter is moving or stationary, and can determine the number of weapons fired. This real-time data allows law enforcement to locate evidence or a victim within a few yards, and in some cases intercept the shooter.
Such systems also have a predictive value. Only about 20 percent of urban gunshots are ever reported to police by the public. So gunshot detection systems give a much more accurate indication of the number and location of shots fired. And thieves often test stolen weapons before selling them. For instance, a detection system in a large West Coast city detected many “confidence” shots fired over time at a high school athletic field, which enabled law enforcement to arrest the perpetrators, solve a number of weapons thefts and prevent potential crimes.