April 23, 2011    /    by

Amazon Instability Shakes the Cloud: What's Next?

Youre never as good as you look when you're winning, and never as bad as you look when you're losing. I think that adage applies beyond sports to many aspects of life and business including the management of computer operations connected to the global Internet in 2011.

There’s a popular sports adage that goes something like: “You’re never as good as you look when you’re winning, and never as bad as you look when you’re losing.” I’m not sure who said it first, but I think the wisdom applies beyond sports to many aspects of life and business – including the management of computer operations connected to the global Internet in 2011.

Make no mistake, the latest Amazon’s Elastic Compute (EC2) outage is a very big deal. Global coverage has landed on the front pages of major newspapers and magazines. A Forbes blog boldly proclaimed, “The Day The Cloud Died.” described some of the impacts in an article entitled: “Amazon EC2 outage downs  Reddit, Quora.” The article begins by displaying a tweet which says, “The sky is falling! Amazon’s cloud seems to be down (raining?) so we’re experiencing some issues too. Be back soon!”  

As of Saturday morning April 23, the Amazon Web Services Health Dashboard, still showed the Amazon Elastic Compute Cloud (N. Virginia) as being red with a status of “Instance connectivity, latency and error rates.”

Government Technology Magazine online initially covered the story two days ago, stating, “Since its launch in 2006, Amazon EC2 has been one of cloud computing’s greatest success stories.”

Even from as far away as Australia (the outage was in northern Virginia), ZDNet exclaimed, “Amazon Outage ends cloud innocence.” Here’s an interesting quote from that article:

“Cloud computing learned the harsh reality of resiliency as Amazon Web Services' outage crossed into its second day. Meanwhile, start-ups and a host of other AWS customers are in uncharted waters….

Given that AWS' North Virginia data center has been out of whack for more than 24 hours, following a ‘networking event’ that led to problems with how data is mirrored, it's clear you need to procure more than one cloud. You need a backup for your cloud provider's backup.”

It’s clear that this week’s outage has shaken trust in cloud computing – again. No, this is not the first time. Indeed, if you try a Google search with the words “Amazon Outage,” you are offered choices in 2008, 2009, 2010 and 2011. However, this latest Amazon “cloud earthquake” (or if you prefer Tsunami) looks to be much bigger than anything before - with significantly wider customer impact and more world-wide attention.

While some competitors are no doubt smiling as an industry cloud leader like Amazon takes a beating, I expect other large cloud providers to quickly try to limit the damage to the red hot cloud marketing label. They will tell us why their products and services are different and/or why this type of incident could never happen to them. Don’t believe such bold statements. We've learned multiple times over the past few years that the best-known providers such as Google, Microsoft and even outsourced public sector data centers experience major unplanned outages that impact customers.    

Indeed, this article by points out the importance of cloud computing architectures & building redundancy into your technology design:

“Netflix, a large AWS user has institutionalized this in their deployment model. In fact they frequently let loose their Chaos Monkey that constantly forces random failures of even stable AWS instances to ensure recovery. Unlike Foursquare, Quora and Hootsuite, Netflix did not report any failures during the current AWS east region outage. a prominent federal government website running on AWS, also recovered quickly and gracefully in another AWS region.

So while the failures have been catastrophic, perhaps embarrassing and will hopefully prompt a review of application deployment and recovery strategies, they are not serious enough to change the dynamics of cloud adoption in short or long term. The benefits of on-demand cloud infrastructure -- such as rapid cycle time, lower capital costs and utility pricing models -- remain strong cloud drivers today, just as they were last week.”

I agree with the author Ahmar Abbas’ final assessment. The Cloud will move beyond this situation and be just fine - with customers adjusting and implementing needed design modifications. Amazon will probably come back stronger. Services may cost a little more for customers, but new, secure, cloud offerings are the future for all of us. In the meantime, private cloud adopters will temporarily gloat and say, “I told you so.”  

 So my advice to public sector leaders is to move ahead on cloud computing plans as before, with appropriate caution. Look at the many private cloud options (like our current Michigan MI-Cloud) or hybrid public/private alternatives. Keep watching the FedRAMP progress for secure opportunities to improve efficiency.

But taking a big step back, this Amazon situation is just another example from this blog’s opening words. Pride comes before a fall in all areas of life. Smart people know that no cloud vendor is perfect or invincible. The number of large-scale breach announcements should teach us that.

We can also learn this same lesson from sports. If, despite the claims of the best experts, the New York Yankees with their huge payroll can get rocked by the Texas Rangers in the American League Championship Series in 2010, a major cloud provider like Amazon can – and will - go down as well. Yes, it will happen again to another cloud provider, so get used to it.  

When expectations are too high, bad things happen – even to the top sports teams. Likewise, the best global corporations sometimes perform poorly, despite their focused efforts. We must prepare for the unexpected – even in the cloud.