While officials initially feared that the problems could be the result of a cyberattack, the source of the problems was swiftly discovered to be an internal maintenance issue.
The downing of the Michigan state government network last week forced a rash of office closures and temporary service delays, but the state's IT staff was able to remedy the issue with relative expedience.
Initial fears that the outage may have been the result of a cyberattack were quickly assuaged after officials discovered that the incident occurred as the result of an internal maintenance issue that generated an unmanageable amount of network traffic.
Higher than normal resource utilization by an internal server generated so much traffic that the government's Domain Name System (DNS) servers, which help connect end users to websites through IP addresses, were forced into a "hung" or unresponsive state, said Caleb Buhs, director of communication for the state's Department of Technology, Management and Budget (DTMB).
"We noticed an inability to resolve network addresses," said Jack Harris, the state's chief technology officer, in an interview with Government Technology. "We had a server that was trying to ping a lot of devices on our network to get information and it hadn't been throttled by the administrator, so it was doing an awful lot in a short amount of time." The larger than normal traffic load resulted in an inability of the network's resolvers "to resolve all the requests," he said.
The resultant network problems, though only lasting a matter of hours, still caused significant disruption that day and forced a shutdown of Secretary of State offices across the state, due to the inability of staff to access the Internet. Subsequently, most agencies, with the exception of the State Police, Governor’s Office, and Lottery and Gaming Control Board, were affected and service delivery had to be temporarily managed through workarounds, Buhs said.
After noticing the abnormalities around 2:30 p.m. on Nov. 26, the state's security and network operations centers immediately set to work trying to determine whether the surge had been instigated by an external actor. They found, eventually, that the surge was the result of maintenance work being performed, said Harris. Service to many agencies was restored several hours after the initial trouble.
Harris, who has worked with the state for nearly two decades and who took over the CTO position in May, has made it a goal to advance network strategies that promote and enhance direct IT services. Looking ahead, he said there are a few priorities that can be focused on to help avoid a similar incident in the future.
This includes increasing the state's internal DNS capacity, which would expand its ability to take on a high volume of requests. That's a goal that can be accomplished in the short term, Harris said. A more long-term goal, meanwhile, will be to examine the internal DNS architecture to see if there are alternatives that can be used to make it more "anti-fragile," he said.
"I was pleased to see how our network and security operations staff worked together on this so quickly," Harris said, adding that their office had been "training for an incident like this for some time."