IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Disaster Recovery Planning Gets No Respect

As the use of computers increases in government, so too should disaster recovery planning. But this critical aspect of information technology is often overlooked.

Helena, Mont., may seem like a safe haven. It's located far from Tornado Alley and the coastlines that are vulnerable to hurricanes. Floods are not a problem, nor are earthquakes. Still, the state's Information Services Division (ISD) isn't taking any chances.

It has a contract for a hot site at a recovery center located in another state. And plans are under way to move ISD's cold site, now located in Helena, to another location about 50 miles away -- just in case. "We call it our hope chest," said Leslie Cummings, ISD's disaster recovery coordinator, referring to resources kept at the cold site. "It contains all of the detailed items of what we need to recover our data center."

But Cummings admits that the state's overall recovery plan is far from complete. "At this point in time, the only thing we can recover 100 percent is the data center's mainframe," she said. Still vulnerable are a number of mid-range systems as well as numerous agency systems on both distributed and centralized platforms.

Montana, like many other state and local governments, has found that its disaster recovery plans and budget have not kept pace with the rapid growth in computing. Once, government computers -- typically mainframes -- handled a few core applications such as payroll, accounting, drivers' licenses, retirement and Workers' Compensation, for example.

Today, however, automation has permeated every agency and department, from the governor's or mayor's office to procurement offices, teacher's certification bureaus and everything in between. Many of the new systems run in a distributed computing environment, such as client/server. Because these systems are outside the control of data centers, the amount of disaster planning is suspect.

"Most of the applications that have gone to client/server leave the scope of our data center," pointed out Michael McVicker, assistant director for the state of Washington's Department of Information Services (DIS). Recovery plans at DIS are comprehensive and up-to-date, but outside its doors, recovery planning is another matter. "I'm not involved in disaster recovery at that level," said McVicker, referring to the systems and data implemented and used by the state's 155 agencies.

SILENT KILLER
A study conducted by the University of Texas in 1987 revealed that even a decade ago, the impact of a computer outage was profound:

Eighty-five percent of organizations were heavily or totally dependent upon computer systems.
On average, by the sixth day of an outage, companies experienced a 25 percent loss in daily revenue. By the 25th day, daily revenue loss was 40 percent.
Within two weeks of the loss of computer support, 75 percent of organizations reached critical or total loss of their functions.
Forty-three percent of companies that experienced a disaster -- but had no tested business recovery plan in place -- never reopened.
Organizations estimated that their revenue losses would be two-and-a-half times as severe if their contingency plans were not activated.
While government agencies don't go out of business when a disaster strikes, prolonged outages can hurt the taxpayers. "There are too many citizens who require services from the state for us to say we're going to take a chance and not have a recovery plan for unemployment compensation or AFDC (Aid for Families with Dependent Children)," said McVicker.

Few governments can afford the cost of a computer disaster. For example, six years ago, the state of Washington conducted a study that determined a 30-day outage for the state's data center would cost Washington taxpayers hundreds of millions of dollars.

The potential repercussions from a computer disaster in government are significant, and yet agencies drag their feet when it comes to developing and maintaining adequate recovery plans. At the same time, a lack of resources to support recovery plans is making a difficult job even harder.

"If you don't have a plan in place and don't have it tested, it's like having a silent killer," said Mike Zanon, a manager for Oregon's Information Resources Management Division. "Unfortunately," he added, "until there's a crisis, people don't respond to disaster planning and take the necessary steps to prevent a long-term outage."

Zanon compared agency attitudes toward disaster recovery planning to a dentist visit. "We all know we should brush our teeth and have regular checkups, but sometimes, given how busy we are, people don't always attend to it as they should, because there's no immediate payback."

Zanon believes that agencies in Oregon are doing an adequate job backing up their data, but budget cutbacks in his department have curtailed recovery plan audits, which were routine in the past.

Meanwhile, evidence is mounting that recovery planning is not keeping pace with the proliferation of distributed computer systems. Numerous studies have revealed that the majority of local- area and wide-area networks have no established recovery plans, according to a disaster recovery white paper produced by the Deloitte & Touche Consulting Group. It goes on to say that while "the business world increasingly is employing networks of microcomputers as cost-effective alternatives to mainframe problems, many have not acknowledged the mission-critical nature of these systems."

INSURANCE POLICIES
Setting policy for disaster recovery is one way to ensure that a computer outage doesn't bring business to a screeching halt. In Washington, disaster recovery policy is set by the state's Information Services Board, an umbrella organization that has authority over the use of information technology by all state agencies.

The policy covers all types of computing platforms, including mainframes, mid-range computers, file servers and PCs. According to McVicker, the policy requires "all agencies to plan, implement, maintain and test disaster recovery plans, train their employees to execute the recovery plan and, in the event of a disaster, take the necessary steps to mitigate the impact."

Oregon has a similar policy. "It's not very descriptive," said Zanon. He explained that the policy is purposely general because the agencies are so diverse in size, levels of experience and the amount of resources they have. "We try to set the boundaries of the playing field," continued Zanon. "We talk in general terms about the things you should address in policy and then give the agencies some freedom or restrictions on how they can carry out the policy."

In Montana, the state has directed its Information Services Division to develop what it calls an enterprise disaster recovery plan, which includes all facets of the state's business continuity and data center recovery plans. The idea is that the plan will serve as a boilerplate for helping agencies develop their own business continuity and disaster recovery plans and procedures.

Testing is a major component of disaster recovery planning. Some disaster recovery firms specify a minimum of one test per year, but most recommend more. Washington's DIS tests its plan twice a year. "We go to our hot site facility, reload agency information, restore the application and then test it to verify that it's alive and well," explained McVicker. In Montana, ISD runs a full test once a year and a scaled-down version semiannually.

Tests reveal any weaknesses in plans. One area that can suffer is training. Recovery plans require teams of workers to carry out the procedures that can bring an application back into operation. Deloitte & Touche identifies as many as 10 planning areas, each requiring a recovery team. Since few staff work full-time at disaster recovery, team members need training to keep their skills up to speed. Cummings said training is pretty much a self-guided operation in Montana, although she has purchased training videos to help staff.

Tests can also expose another weakness, namely the lack of planning discipline and continuity. McVicker pointed out that everything done by Washington's data center has a disaster recovery component -- software upgrades, hardware enhancements or procedural changes, for example.

"What's really critical is establishing the disciplines so that as you make those changes on a day-to-day basis, you are always asking: 'what's the impact of this change on our disaster recovery plan?'"

McVicker added that these changes all have to be documented and duplicated off site. "When you have a disaster and have to send all your people off site, there's an assumption that all the processes, procedures and changes that have occurred at the data center have been duplicated off site," he said. When discipline breaks down and the duplication doesn't occur, then bringing up an application off site will be difficult, if not impossible.

Because it takes so much discipline to cover all the bases, recovery plans can take time to complete. In Washington, it took 1.5 years and three tests before the entire plan was considered ready. Montana's ISD has been working for almost two years to get its recovery plan completed and fully tested.

SOFTWARE TOOLS
While disaster recovery planning is primarily an effort involving organizational and management skills, technology can lend a hand. Disaster Recovery Journal surveyed the field of PC-based disaster recovery planning software and came up with a list that's five pages long. Prices for the commercial software packages range from a few hundred dollars to tens of thousands of dollars. Most packages operate as a database, a word processor or both, with some able to conduct an impact analysis for contingency planning.

Cummings said she is looking into software tools to help her with contingency planning. In Connecticut, the Comptroller's Office is using SunGard's CBR planning software to help develop, test and maintain its recovery plan. According to SunGard, the software includes models of recovery procedures, plans and documents. Thomas Peraro, disaster recovery coordinator for the Comptroller's Office, said the software has helped the office organize its recovery plans.

Other tools include software that protects users against hard drive problems by automatically switching operations from a system's primary file server to a standby or backup server, which can be located at another site. These kinds of solutions can reduce computer downtime from hours or days to minutes or even seconds.

But no matter how sophisticated the software or how advanced the backup server, nothing beats a well-developed recovery plan that has been tested and is run by well-trained personnel. And to reach that level of preparedness takes resources, a fact that often escapes top-level management as well as budget-pinching legislatures.

Cummings, the only person in Montana's state government trained to handle computer disaster recovery has 25 percent of her time budgeted for recovery work, although her actual work far exceeds the allotted time. "Upper management is not aware of how much time is actually spent on disaster recovery planning," said Cummings. "And the amount of time needed continues to increase."

Cummings called the lack of resources one of her toughest challenges, and she's not alone. Peraro called the lack of state support in terms of resources one of the biggest hurdles he faced when developing a disaster recovery plan. "Disaster recovery is like buying life insurance," he said. "Everybody knows they need it, but no one wants to think about it."

For more information, contact: Leslie Cummings at < lcummings@mt.gov >, or Thomas Peraro at 860/702-3603.


*
--------------------------------------------------------------------------------

Fran Frazzles Durham's Data
Hurricane Fran's 120 mph winds wreaked havoc on North Carolina's coastline last August, but it was the immense amount of rain, which fell far inland, that probably did the most damage.

The city of Durham's MIS department lost five minicomputers to water damage and was left without power for three days, according to a report in Computerworld. Ironically, the outage prevented the city's Unisys mainframe from processing its normal volume of residential and commercial water bills.

With more than 20 years of experience covering state and local government, Tod previously was the editor of Public CIO, e.Republic’s award-winning publication for information technology executives in the public sector. He is now a senior editor for Government Technology and a columnist at Governing magazine.