Amazon Instability Shakes the Cloud: What's Next?

Youre never as good as you look when you're winning, and never as bad as you look when you're losing. I think that adage applies beyond sports to many aspects of life and business including the management of computer operations connected to the global Internet in 2011.

by / April 23, 2011 0

There’s a popular sports adage that goes something like: “You’re never as good as you look when you’re winning, and never as bad as you look when you’re losing.” I’m not sure who said it first, but I think the wisdom applies beyond sports to many aspects of life and business – including the management of computer operations connected to the global Internet in 2011.

Make no mistake, the latest Amazon’s Elastic Compute (EC2) outage is a very big deal. Global coverage has landed on the front pages of major newspapers and magazines. A Forbes blog boldly proclaimed, “The Day The Cloud Died.” described some of the impacts in an article entitled: “Amazon EC2 outage downs  Reddit, Quora.” The article begins by displaying a tweet which says, “The sky is falling! Amazon’s cloud seems to be down (raining?) so we’re experiencing some issues too. Be back soon!”  

As of Saturday morning April 23, the Amazon Web Services Health Dashboard, still showed the Amazon Elastic Compute Cloud (N. Virginia) as being red with a status of “Instance connectivity, latency and error rates.”

Government Technology Magazine online initially covered the story two days ago, stating, “Since its launch in 2006, Amazon EC2 has been one of cloud computing’s greatest success stories.”

Even from as far away as Australia (the outage was in northern Virginia), ZDNet exclaimed, “Amazon Outage ends cloud innocence.” Here’s an interesting quote from that article:

“Cloud computing learned the harsh reality of resiliency as Amazon Web Services' outage crossed into its second day. Meanwhile, start-ups and a host of other AWS customers are in uncharted waters….

Given that AWS' North Virginia data center has been out of whack for more than 24 hours, following a ‘networking event’ that led to problems with how data is mirrored, it's clear you need to procure more than one cloud. You need a backup for your cloud provider's backup.”

It’s clear that this week’s outage has shaken trust in cloud computing – again. No, this is not the first time. Indeed, if you try a Google search with the words “Amazon Outage,” you are offered choices in 2008, 2009, 2010 and 2011. However, this latest Amazon “cloud earthquake” (or if you prefer Tsunami) looks to be much bigger than anything before - with significantly wider customer impact and more world-wide attention.

While some competitors are no doubt smiling as an industry cloud leader like Amazon takes a beating, I expect other large cloud providers to quickly try to limit the damage to the red hot cloud marketing label. They will tell us why their products and services are different and/or why this type of incident could never happen to them. Don’t believe such bold statements. We've learned multiple times over the past few years that the best-known providers such as Google, Microsoft and even outsourced public sector data centers experience major unplanned outages that impact customers.    

Indeed, this article by points out the importance of cloud computing architectures & building redundancy into your technology design:

“Netflix, a large AWS user has institutionalized this in their deployment model. In fact they frequently let loose their Chaos Monkey that constantly forces random failures of even stable AWS instances to ensure recovery. Unlike Foursquare, Quora and Hootsuite, Netflix did not report any failures during the current AWS east region outage. a prominent federal government website running on AWS, also recovered quickly and gracefully in another AWS region.

So while the failures have been catastrophic, perhaps embarrassing and will hopefully prompt a review of application deployment and recovery strategies, they are not serious enough to change the dynamics of cloud adoption in short or long term. The benefits of on-demand cloud infrastructure -- such as rapid cycle time, lower capital costs and utility pricing models -- remain strong cloud drivers today, just as they were last week.”

I agree with the author Ahmar Abbas’ final assessment. The Cloud will move beyond this situation and be just fine - with customers adjusting and implementing needed design modifications. Amazon will probably come back stronger. Services may cost a little more for customers, but new, secure, cloud offerings are the future for all of us. In the meantime, private cloud adopters will temporarily gloat and say, “I told you so.”  

 So my advice to public sector leaders is to move ahead on cloud computing plans as before, with appropriate caution. Look at the many private cloud options (like our current Michigan MI-Cloud) or hybrid public/private alternatives. Keep watching the FedRAMP progress for secure opportunities to improve efficiency.

But taking a big step back, this Amazon situation is just another example from this blog’s opening words. Pride comes before a fall in all areas of life. Smart people know that no cloud vendor is perfect or invincible. The number of large-scale breach announcements should teach us that.

We can also learn this same lesson from sports. If, despite the claims of the best experts, the New York Yankees with their huge payroll can get rocked by the Texas Rangers in the American League Championship Series in 2010, a major cloud provider like Amazon can – and will - go down as well. Yes, it will happen again to another cloud provider, so get used to it.  

When expectations are too high, bad things happen – even to the top sports teams. Likewise, the best global corporations sometimes perform poorly, despite their focused efforts. We must prepare for the unexpected – even in the cloud.


Dan Lohrmann Chief Security Officer & Chief Strategist at Security Mentor Inc.

Daniel J. Lohrmann is an internationally recognized cybersecurity leader, technologist, keynote speaker and author.

During his distinguished career, he has served global organizations in the public and private sectors in a variety of executive leadership capacities, receiving numerous national awards including: CSO of the Year, Public Official of the Year and Computerworld Premier 100 IT Leader.
Lohrmann led Michigan government’s cybersecurity and technology infrastructure teams from May 2002 to August 2014, including enterprisewide Chief Security Officer (CSO), Chief Technology Officer (CTO) and Chief Information Security Officer (CISO) roles in Michigan.

He currently serves as the Chief Security Officer (CSO) and Chief Strategist for Security Mentor Inc. He is leading the development and implementation of Security Mentor’s industry-leading cyber training, consulting and workshops for end users, managers and executives in the public and private sectors. He has advised senior leaders at the White House, National Governors Association (NGA), National Association of State CIOs (NASCIO), U.S. Department of Homeland Security (DHS), federal, state and local government agencies, Fortune 500 companies, small businesses and nonprofit institutions.

He has more than 30 years of experience in the computer industry, beginning his career with the National Security Agency. He worked for three years in England as a senior network engineer for Lockheed Martin (formerly Loral Aerospace) and for four years as a technical director for ManTech International in a US/UK military facility.

Lohrmann is the author of two books: Virtual Integrity: Faithfully Navigating the Brave New Web and BYOD for You: The Guide to Bring Your Own Device to Work. He has been a keynote speaker at global security and technology conferences from South Africa to Dubai and from Washington, D.C., to Moscow.

He holds a master's degree in computer science (CS) from Johns Hopkins University in Baltimore, and a bachelor's degree in CS from Valparaiso University in Indiana.

Follow Lohrmann on Twitter at: @govcso