Human error when repairing a storage area network caused data loss and delay, concludes audit of August 2010 computer failure that hampered more than two dozen Virginia agencies.
A major computer failure that knocked out service to more than two dozen Virginia government agencies last summer was exacerbated by human error, according to an audit released Tuesday, Feb. 15.
The finding, announced jointly by Gov. Bob McDonnell and the state’s legislative audit committee, blamed the severity of the computer failure — 26 of the state’s 89 state agencies were affected in late August, some for as long as a week — partly on the maintenance process done on a storage network located at the state’s data center in Richmond.
Data corruption occurred across several critical systems when a memory board was replaced on the storage network, a DMX-3 Series manufactured by EMC. The audit said two memory boards were reporting errors, and that they were replaced in the incorrect order.
The protracted incident was the latest blemish on the enterprisewide IT partnership struck between the Virginia Information Technologies Agency and Northrop Grumman in 2006. The 10-year, $2.3 billion contract was written such that Northrop Grumman would provide the state’s IT services. But the project — the largest of its kind among the nation's state governments — has been plagued by delays, cost overruns and poor service.
Virginia CIO Sam Nixon, appointed by the governor to fix the wide-ranging partnership, wrote in a letter to the Joint Legislative and Audit Review Commission over the weekend that the audit confirmed the state’s basic understanding of the incident.
“Considerable progress has been made to consolidate, centralize and standardize Virginia’s infrastructure into a cohesive, secure platform across the enterprise of state government,” Nixon wrote. “All but a few state agencies have been transformed. But as this report indicates, the work to ensure that the Commonwealth enjoys the benefits of transformation is ongoing, and requires constant discipline and effort by both the Commonwealth and Northrop Grumman.”
The auditor, Agilysys, said that the computer outage demonstrated that at the time Virginia and vendor partner Northrop Grumman lacked adequate self-governance for risk management.
There was also a lack of data protection processes for the state’s key applications, the report said. “A business impact analysis review should be implemented in conjunction with VITA and state agencies to reassess the recovery time and recovery point objectives needed for key data,” according to the audit.
The audit makes 26 separate recommendations for process improvement.
All the state’s corrupted data was recovered two months later, in October.
McDonnell ordered the third-party audit last fall and announced that Northrop Grumman would pay the study’s cost. “Many Virginians were inconvenienced by this disruption and lost hours of their time in dealing with the outages,” McDonnell reiterated via a statement Tuesday. “It was an unacceptable failure and one that cannot be allowed to reoccur.”
Samuel Abbate, Northrop Grumman vice president for the VITA technology partnership, said via correspondence that the company agreed with many but not all of the audit’s findings and recommendations.
“Some of the [audit’s] most important recommendations are matters for the Commonwealth’s policy makers to consider, for example establishing a common definition of critical data and determining what protective measures should be undertaken,” Abbate wrote. “Also, the report observes and is critical of the fact that typical enterprise approaches are not always applied in the program. These observations may be fundamentally correct, but individual agencies, not VITA or Northrop Grumman, retain the ability to deviate from centralized enterprise standards.”
View Agilysys’ complete audit.