Avoiding Emergency System Downtime in a Disaster

Business continuity software installed in a dispatch center prevents downtime when a summer lightning storm strikes.

by Jeffrey Satkowski / November 15, 2010
U.S. Global Change Research Program

Editor’s Note: Jeffrey Satkowski is a systems administrator for the Lapeer County, Mich., Central Dispatch.

I was at the gym working out when our dispatch center called to alert me about a violent electrical storm that was moving through the Midwest plains. It was a summer day this year, and Lapeer County sat directly in its path. A rural area located 60 miles north of Detroit, Lapeer County is one of five counties that make up what is referred to as “The Thumb” — a peninsula projecting into Lake Huron, which is a sub-region of Flint and Tri-Cities (Saginaw, Midland and Bay City).

Electrical storms always cause more emergency calls to 911 because trees fall, streets flood and lightning strikes buildings. These calls come in to our dispatch center, where trained public safety specialists mobilize emergency services to protect lives and property. If our dispatchers become disconnected from emergency systems, it impairs their ability to make vital and timely life-and-death decisions. When a disaster or emergency occurs, officials need to react immediately.

In this electrical storm, our emergency dispatch center and radio tower were struck by lightning, and everything went black. But with uninterruptable power supplies protecting the servers and emergency generators providing power, we were well prepared. Immediately following the strike, all of the vital emergency dispatch applications hosted on our IT system were online and the dispatchers were able to respond to emergency calls.

The True Disaster Struck the Next Day

However, I arrived at work the next day suspecting that the full effect of the strike had not yet been felt. Despite proper grounding, powerful strikes can weaken systems and cause them to fail later. Sure enough, the server responsible for all our dispatch records crashed during the afternoon. This server ran voice recording software and handled all historical communications records, including telephone and radio traffic to and from the 911 center. While these records are important, they are not mission-critical to operations. After several attempts to restore the server were unsuccessful, we had a spare server delivered and immediately began the installation.

During the process to rebuild the voice recording server and get it back online, our primary computer-aided dispatch (CAD) server also failed. The CAD server, running VisionCAD provided by VisionAIR, is a vital link in providing emergency services to the public. Without it, our dispatchers have to revert to pen and paper, which severely hampers response times and can cause unacceptable errors in transmitting information.

Preparation Is the Best Way to Avoid Disaster

Fortunately we had prepared for the worst. Just months before this disaster, we had installed a continuous availability solution to protect our critical data. Our CAD system and SQL database were protected against downtime by the Neverfail Continuous Availability Suite that maintained a second copy of the CAD system on an alternate server. Because the software also monitors whether systems are working and available to end-users, it detected the failure of the primary server and immediately switched CAD operations to the secondary server. As a result, dispatch operations were able to continue seamlessly and we never compromised public safety.

The secondary server kept our CAD system running for two weeks while our team sourced new hardware and worked to get the primary server up and running again. When the primary server was restored, we worked closely with our partner to load software, resynchronize data and finally switch live operations back to the primary system.

The events during and following the storm proved that we had been right to implement a business continuity solution to provide uninterrupted access to our mission-critical CAD system. Since the installation, we conduct routine tests every month when we are updating the Microsoft Windows servers to make sure the primary server fails over and back seamlessly. We feel lucky to have averted disaster, and we’re working diligently with our technology partners to help to ensure we never have downtime so we can protect public safety.

Platforms & Programs