Amazon Web Services Outage: Causes And Remedies

If been a big fan of Amazon Web Services (AWS) because they lower the costs of startup experimentation. I’ve sponsored their events, judged their startup competition, etc. I have friends on the team. I’ve also had frank conversations with them about service level agreements and what it means to be an infrastructure provider in a mashup world. Mashups increase the need for high availability and uptime. If the user experience of a mashup application requires, say, five web services from three separate companies to be available the overall probability of failure goes up subtantially. it’s the weakest link in the chain argument.The Net learned this the hard way yesterday when multiple AWS services (S3, EC2, SQS, Simple DB, etc.) had a multi-hour outage. The problem was exacerbated by the fact that, internally, various AWS services depend on one another and especially the storage service, S3.It looks like the cause for the outage was a particular use pattern of S3:

What caused the problem however was a sudden unexpected surge in a particular type of usage (PUT’s and GET’s of private files which require cryptographic credentials, rather than GET’s of public files that require no credentials).  As I understand what Kathrin said, the surge was caused by at least one very large customer plus several other customers suddenly and unexpectedly increasing their usage. 

I would highly recommend for anyone who is building a developer community or providing SaaS infrastructure or relying on SaaS infrastructure to take the time and read the many posts on the AWS forums about the outage. You hear the real pain and frustration of people whose businesses depend on AWS. The key complaint was not that the service failed–failures do happen–but that Amazon was not prepared to engage with the developer community around the failure.

It’s AmazING the fact of having no info on what’s happening. Absolutely unacceptable. Come on, people on this forum are all tech guys, so we understand that bad things happen from time to time. However, you MUST be transparent with your customers and give them details on what’s going on (yes, we want to know exactly what’s happening and not a standard response like ‘The issue is resolved’). In fact, it is not. So please, scale these complaints to the right person and post the technical explanation of the issue as soon as possible.

Jesse Robbins over at O’Reilly has a good post comparing how Amazon dealt with the situation to how Salesforce responded to its infamous outage a couple of years ago. I’ve also blogged before about how SaaS brings increases responsiblities.All in all, Amazon worked very hard to get the issue resolved and the community was thankful for their efforts.

As I said before, you need to be transparent with your customers. No service can provide 100% uptime. It’s a fact. No matter if u have a redundant anycast network or supercalifragilisticexpialidocious elastic clouds. I just want to get notified and know what’s exactly happening. Nothing else. That said, the issue was resolved very fast, so you should be very proud. Hats off to Amazon’s IT staff.

About Simeon Simeonov

I'm an entrepreneur, hacker, angel investor and reformed VC. I am currently Founder & CTO of Swoop, a search advertising platform. Through FastIgnite I invest in and work with a few great startups to get more done with less. Learn more, follow @simeons on Twitter and connect with me on LinkedIn.
This entry was posted in amazon web services, SaaS, startups, Web 2.0 and tagged , , , , , , , , , , . Bookmark the permalink.

9 Responses to Amazon Web Services Outage: Causes And Remedies

  1. A.T. says:

    and those who had gone this road were saying “do NOT rely on Amazon only” http://web.archive.org/web/20070406174427/http://blogs.smugmug.com/don/files/ETech-SmugMug-Amazon-2007.pdf – yet we see those who made it relying on AWS, and now demanding explanations and transparency.

  2. Pingback: Amazon Ran Out of Capacity « SmoothSpan Blog

  3. Pingback: Lessons on How to Use Amazon Web Services « HighContrast

  4. Blagovest says:

    There is a very important question you have to ask yourself before deciding whether to use S3: what are you really looking for – remote storage, content delivery, or both. These are crucial to distinguish.

    What I observe is that most people treat Amazon S3 as a content delivery service. While this is not inherently wrong, one has to notice that S3 was especially designed to be a STORAGE service. S3 does not claim to be a CDN.

    The point is, since terabyte hard drives are affordable nowadays and internet traffic grows steadily, the stress goes much more on content delivery and network infrastructure rather than on storage. If you are not concerned about using remote storage, there are much better services especially suited for content delivery.

    SteadyOffload.com provides an innovative, subtle and convenient way to offload static content. The whole mechanism there is quite different from Amazon S3. Instead of permanently uploading your files to a third-party host, their cachebot crawls your site and mirrors the content in a temporary cache on their servers. Content remains stored on your server while it is being delivered from the SteadyOffload cache. The URL of the cached object on their server is dynamically generated at page loading time, very scrambled and is changing often, so you don’t have to worry about hotlinking. This means that there is an almost non-existent chance that the cached content gets exposed outside of your web application.

    It’s definitely worth trying because it’s not a storage service like S3 but exactly a service for offloading static content.

    Watch that:
    http://video.google.com/videoplay?docid=-8193919167634099306 (the video shows integration with WordPress, but it is integrable with any other webpage)
    http://www.steadyoffload.com/
    http://codex.wordpress.org/WordPress_Optimization/Offloading

    Cost of bandwidth comes under $0.2 per GB – affordable, efficient and convenient. Looks like a startup but lures me very much. Definitely simpler and safer than Amazon S3.

  5. Blagovest, thanks for pointing this out.

    I like the idea and how easy it would be to integrate SteadyOffload into a site. I don’t like that there isn’t any information on the site about who these guys are, what’s their network like, etc. Also, 99.9% availability is nothing to boast about.

  6. Blagovest says:

    Yep, the main point in my comment was that S3 is a good storage solution but definitely not the best content delivery solution.

    It’s a young startup company located still only in Europe. But the idea seems very promising, isn’t it? They also provide their customers with detailed stats in the control panel of the service, which is by the way implemented in Adobe Flex.

    BTW, ако все още не си забравил български можем да пишем на родния език 🙂

  7. leen rose says:

    ang ganda mang pagkaka sulat

  8. Pingback: 100 mil dólares à empresa vencedora na Competição de Start-up´s « ATer criacao de sites (11) 2527-3032 / www.ater.com.br

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s