Data center outages remain common and three major factors — uninterruptable power supply (UPS) battery failure, human error and exceeding UPS capacity — are the root causes, according to a new study released earlier this month.
Study of Data Center Outages, released by the Ponemon Institute on Sept. 10, and sponsored by Emerson Network Power, revealed that 91 percent of respondents experienced an unplanned data center outage within the last 24 months, a slight dip from the 2010 survey results, when 95 percent of respondents had reported an outage.
“That to me is probably a wake-up call to most data center professionals who should think seriously about what happens when they have unplanned outages,” said Peter Panfil, vice president of global power sales for Emerson Network Power.
The study reported findings on survey responses from IT professionals nationwide, including 8 percent from public sector. Fifty-five percent of the survey’s respondents claimed that UPS battery failure was the top root cause for data center outages, while 48 percent felt human error was the root cause. Forty-six percent of those surveyed cited exceeding UPS capacity as a major problem.
During the last few months, two major data center outages occurred within state governments. According to local media, Oregon government services encountered major setbacks after the state’s data center went down in July, resulting in the delay of unemployment payments and a temporary loss of state employee email access. On Sept. 13, a power outage in one of New Jersey’s three state data centers caused a temporary shutdown of the state's websites and computers.
Panfil said a typical data center operating on 1 megawatt of UPS will have about five strings of batteries, each string containing 40 batteries. Much like a string of Christmas lights wired in a series, if one of those batteries fails, the entire string will also fail.
“In many cases, that failure cannot be detected if the battery is just sitting there at an idle state not delivering power,” Panfil said. “As a battery ages and as it starts to fail, its internal resistance goes up and that’s one of its failure mechanisms.”
How to Prevent an Outage
According to the survey responses, IT professionals of high-performing data centers recommend the following actions for preventing outages:
- Consider data center availability their highest priority above all others, including cost minimization and improving energy efficiency;
- Utilize all best practices in data center design and redundancy to maximize availability;
- Dedicate ample resources to bring their data center up and running in case of an unplanned outage;
- Have complete support from senior management on efforts to prevent and manage unplanned outages;
- Regularly test generators and switchgear to ensure emergency power in case a utility outage does occur;
- Regularly test or monitor UPS batteries; and
- Implement data center infrastructure management (DCIM).
“No single technology or best practice can completely remove the risk of downtime,” said Larry Ponemon, founder and chairman of the Ponemon Institute. “However, what this report shows us is that by committing the necessary investment in infrastructure technology and resources and taking a number of actions, organizations can dramatically reduce the frequency and duration of unplanned data center outages that can potentially cost data centers thousands of dollars per minute.”