Business Continuity: Muddling Through
Institutional improvisation and disaster-proofing vital government functions.
"Muddling through" remains an under-appreciated art form, even four decades after Yale political scientist Charles Lindblom offered it as a scientific theory for explaining the workings of government. The term has long since been appropriated for pejorative uses, but the underlying idea provides a good starting point for examining business continuity in the public sector.
In 1959, Lindblom observed that fully informed public policy decisions were out of reach because they assumed "intellectual capacities and sources of information that men simply do not possess." In late 2001, in an early speech on homeland security, former House Speaker Newt Gingrich made the same point when he noted, "Whatever is planned for won't happen."
The four corners of muddling through, however, intersect nicely with an incremental approach to business continuity:
1. Clarity of objective
2. Step-by-step processes that build out from current situations
3. "Good" is based on most appropriate means to achieve ends
4. Overcoming fear and making decisions with best available information
Muddling through, in its non-pejorative sense, reflected both the experience and orientation of the IPC business continuity. Even though the represented jurisdictions had well-conceived and tested plans, they recognized that effective business continuity planning must confront the axiom that "generals are always preparing to fight the last war." The unthinkable is now possible, and while well-drilled planning is irreplaceable, so too is the flexibility to manage whatever happens with speed and effectiveness.
Clarity of objective
Military organizations do some of their best work when everything goes wrong. Continuous training and preparedness help, as do dedicated people. But there is also "commander's intent" - a simple, memorable statement of the objective. Even if personnel are left with none of the planned resources and have to improvise all or part of a solution, they have a clear understanding of the mission.
The commander's intent that focused Y2K remediation - the largest software maintenance and business continuity effort on record - has broader application here. Rather than trying to fix every system, the objective is more properly seen as "ensuring no loss of vital public services and no loss of public accountability."
According to the Business Continuity Management (BCM) literature, business continuity is defined as: (a) an "on-going and comprehensive process... that includes disaster recovery, business recovery, business resumption, and contingency planning;" and (b) a "major factor in an organization's survival during and after a disruption" including "traditional emergencies like fires, floods, earthquakes, and tornados, as well as risks from physical and cyber terrorism, cyber crime, computer and telecommunications failures, theft, employee sabotage, and labor strife." The overarching objective of BCM is - according to The Definitive Handbook of Business Continuity Management - "uninterrupted provision of operations and services."
To that end, business continuity can be seen as a cluster of five nested functions: "Contingency planning" is the umbrella to the other four nested functions. Nested within that is "business resumption," and within that "business recovery," and then "technology support," and finally "disaster recovery."
It is the responsibility of the operating agencies, that best know the lines of business, to develop contingencies for the core functions of government - but not in isolation from third parties that will be part of the response.
The lifecycle of business continuity - prevention, detection, response and recovery - takes place in an environment of continual preparation. Contingency planning is focused on institutional improvisation - thinking through stopgap business recovery measures to provide rudimentary services until normal business functions can be fully resumed. Business recovery has long relied on manual and other means to bridge interruptions to the normal way of doing things. Increasingly, business recovery is being accelerated by technology support that creates alternative channels until the primary means of service delivery is restored. Importantly, information technology (IT) now under girds both business recovery in the interim and the full resumption of business as usual.
The disaster itself requires response and recovery by both the business and IT functions of each organization. Working back from the event or disaster, business units and the IT organizations necessarily perform independent analysis of the impact on their respective functions. Each will have their own contingencies to recover their respective functions. That said, IT is often one of the contingencies for a business unit's recovery - and is mission-critical to optimized and timely business resumption.
Step-by-Step Processes That Build Out from Current Situations
Muddling through is, at its core, an incremental approach to governing. It is appropriate, then, that contingency planning sets out a step-by-step process for the care and feeding of vital public processes following a disaster or other disruption.
Developing response processes relies on a granular understanding of day-to-day operations to help determine how much loss could occur before services would be degraded beyond a sustainable level. To that end, a number of jurisdictions have created inventories of functions that constitute essential services:
1. What is done now?
2. What would be done in the event of a disaster or other disruptive event?
3. What are the likely impacts if a particular function was no longer available?
4. What remedial steps are available to provide interim service?
Identifying mission-critical functions and systems is necessary but not sufficient in developing the business continuity processes. The real value is in the prioritization of services that, individually, appear critical but the comparative impacts of which are vastly different when considered side-by-side.
The Internet has muscled its way onto the prioritization list as governments have come to rely on it for the delivery of mission-critical services. That shift has not been met with a corresponding level of effort to integrate government Web properties into business continuity planning. Too often, they are hosted or managed by other units of government that are not part of the business owner's planning process. The lapse underscores the importance of bringing the best thinking from business units, the information technology organization, and day-to-day operations staff to define the normal business environment - and the best approach to dealing with the extraordinary.
Such an understanding allows an organization to lessen the impact of any event by creating safeguards proportionate to the perceived level of threat based on the criticality of the service.
Clearly, contingency planning cannot be separated from the conducting of an agency's business. Contingency plans are better for knowing how things really work on the ground, as opposed to prescribed in a memo. That information can only come from operational staff who, like the business managers and executives, also need to understand the plan, its provisions and its implied commander's intent. (In addition, the upfront involvement of operations and program staff provides a first-line offence against internal inertia that is wedded to the way things have always been done, and can be dismissive of yet another attempt to get the organization's house in order.)
The hard work often comes between the time the plan is drafted and the advent of the disaster - hard work in terms of developing the discipline and allocating the time necessary to train, test, rehearse and communicate across the organization about business continuity.
"Good" Based on Most Appropriate Means to Achieve Ends
No loss of vital public services is an undeniable "good," particularly as it relates to the confluence of public safety, health, transportation and critical infrastructure that is the pressure point during times of emergency.
To realize that good, a formal framework needs to be in place that provides: