Why Amazon EC2 Outage Won't Change Anything in Cloud Computing

Warning: strong cloud advocacy ahead.

With the recent Amazon EC2 outage, many Internet companies such as the walled garden Quora and even one of our competitors remained offline for 24 hours or more. Amazon does not seem to have fully recovered yet, after almost 48 hours, which is a bit disappointing. I have been barely affected, because of all cloud-based services I use, only Springpad is hosted in the North Virginia Amazon datacenter.

As expected, Amazon turned into an evil corporation in no time. Everyone likes to point fingers because it's easy and makes you feel important ("See? I told you the cloud is bad!"), but it's useless. Sure it's Amazon's fault, but what then? It was not the first and won't be the last outage.

The problem here is not reliability and availability of cloud platforms. The problem is the cost of not building on top of a cloud platform and here is why.

Without getting too technical, let's start with the basics. If you want to build a business offering a service to a certain amount of users, you have to guarantee a minimum level of availability and reliability. So you'll need at least two servers, because if one decides to take a nap, you'll need another one to keep the service operational. So you'll need either a fault-tolerant load balancing system, which uses both servers in parallel to balance workload, or "simply" a way to transparently fallback to the other server when the first goes offline (plus you'd have to guarantee data redundancy). It turns out both ways are neither simple or cheap. Then you also have to keep the operating systems up-to-date and correctly configured. When you want to deploy a new version of your software, you'll need a way to do it transparently on both servers and without service disruption. If you're like me, you'll want to deploy a new version at least every week.

So one day your business catches on and you need to add another server. And another. And another. Because at peak hours you have lots of concurrent users. You see, every time you add a server, you have to maintain it, replace faulting components, remembering and managing the differences between the machines, all of this while having a growing bunch of obsolete hardware. You'll begin asking yourself questions like "We're running out of disk space, should we replace all those 3 years old hard drives, or should we simply throw away the servers and replace them with faster, bigger, better new machines?" or "Should we upgrade the operating system to get this new feature, with the risk of breaking something in the process?". You can see a pattern emerging: you resist change, and that's particularly bad in IT.

Now, if you think outside the box, you can see that you spend tons of money in:

  • hardware that is unused most of the time (because you have to design capacity for peak times) and that you'll have to throw away in a couple of years anyway
  • time you don't really have when you're small and that is very expensive even when you can afford it
  • know-how that is really boilerplate, because your customers can't even tell the difference between their DSL modem and your load balancer, so why should you waste brain cells for that?

The solution is offered by cloud computing platforms, like Amazon EC2, Windows Azure and a few others. You purchase computing power, storage space and bandwidth and let the provider figure out all the rest. You get to focus on building your service and providing value to your customers. Look at how things have changed over the past years:

  1. at first you could shell out largish amounts of money to purchase or rent a server and place it in a real datacenter
  2. then the first virtual server providers started to pop out, allowing you to rent a virtual machine for a fraction of the cost of a physical one, effectively lowering the barrier to start a new Internet-based service (yet the process was similar to renting a physical server, with long setup time and non-zero setup fees)
  3. lately you can sign up, pay with a credit card and have a virtual server up and running in minutes, with the possibility to create and destroy virtual servers at will without additional costs (this opened up lots of new possibilities, lowering the barrier even more)
  4. today you can build an application and deploy it in "the cloud", almost without having to worry about servers, either physical or virtual, and just knowing that you can tap into virtually unlimited computing power and storage, with transparent failover and load balancing goodness, all of that with a metered pricing and no long-term contracts.

I consider a real cloud computing platform only the one achieved in the fourth step in this evolution. If you still have to manage individual machines, even if they're virtual, there's not much improvement other than some (still not irrelevant of course) economic advantage.

Getting back to the original topic, Amazon's outage: even without considering all the enormous advantages cloud platforms have to offer, we must ask ourselves if we're better at managing servers than a full-time, dedicated team of professionals like Amazon's. If even they fail sometimes, how can we be better? This reasoning is especially important for small companies - or startups - where the CEO writes code, does customer support and pays phone bills. We already have enough overhead that we can't avoid, so trying to move the burden of managing servers and infrastructure onto someone else is a wise move that allows us to focus on real work.

To conclude this post, I think that Amazon was a pioneer in cloud platforms and still has a dominant position in the market. Their outage was felt as outrageous and huge, but don't forget that there are hundreds of outages every day in small datacenters all over the world, so we'd better off having our application hosted on Amazon EC2 or Windows Azure. It's cheaper, easier, safer and even greener.

Picture by The Planet.

Scared Of Metered Pricing

This is what we have come to understand lately: pay-as-you-go scares customers away.

The assumption behind a pay-as-you-go, or metered pricing plan is that we will think about it as fair. The more we consume, the more we pay. Simple. After all, there are a lot of things, typically utilities, that are metered: electrical power, water, heating gas, phone calls and even Internet usage in some cases. Every month we're charged a different amount but we regard that as perfectly normal.

The problem with Internet-based services is that they're never enough. We want even more bandwidth, even more minutes of usage, even more gigabytes of storage. In a word, everything must be unmetered. Or, much better, free.

An then there's cloud computing, like that offered by Microsoft, Amazon and Google. Companies purchase computing power in 1-hour blocks, storage in GB/month, and so on, to offer a service to end users. It's a typical metered model, designed to allow flexible and intelligent resource usage. This allows companies to build services on top of cloud computing platforms, and to dynamically scale resources depending on the actual number of users they have at any given moment. Large companies take on the burden of purchasing thousands or millions of servers, acting like buffers, while allowing smaller ones to purchase computing powers at will, without any kind of commitment (except some technological lock-in - but that's for another post).

So, it seems perfectly fine to apply a similar pricing model to end users, but it's not, and here is why.

For companies using cloud computing platforms, like ours, estimating their computing power needs is relatively easy. It's a matter of doing some tests and then some very basic calculations. But when we offer a pay-as-you-go service, not only we have to convince the potential customer that we're useful to them, but most importantly that they'll pay a reasonable amount (whose perception varies from one another) for the service. If that amount is unknown or variable, then we've lost already.

The conclusion is: a periodic subscription including a given amount of resources is much better, for two main reasons:

  1. it's easier to understand because pretty much everything works like that on the Internet
  2. knowing the amount to pay in advance gives the customer a sense of control which is completely absent in a metered model.
Funnily enough, even if the potential customer has to pay a bigger amount simply because they're not using all the resource they're given, knowing it in advance will make a huge difference. I have to admit that I was warned about that, but I grossly underestimated its effects (perhaps because I'm an engineer and calculating things is normal to me - normal people not only look at the bottom line, but they also demand to know it in advance).

There also is another aspect to consider. Metered billing ignites a weird behavior in users: they'll try saving every single penny they can. It's mostly irrational because the time they waste doing so is much more valuable than the actual money they're saving. I believe it's something that we all have deep inside us. An instinct, if you like.

BTW, I'm not sure which definition feels scariest, metered or pay-as-you-go. I'll leave you to figure that out.

Microsoft's Approach To Open Source

Read this:
Orchard is a free, open source, community-focused project aimed at delivering applications and reusable components on the ASP.NET platform. [...] In the near term, the Orchard project is focused on delivering a .NET-based CMS application that will allow users to rapidly create content-driven Websites, and an extensibility framework that will allow developers and customizers to provide additional functionality through module extensions and themes.

I discovered today that Orchard is built by Outercurve Foundation, which was created by Microsoft.

Orchard is gaining momentum and for good reasons: it seems like a really good piece of software. The point is, if Outercurve's goal is to enable exchange of code and understanding among companies and open source communities, perhaps, just perhaps, building a product that is a direct competitor of other existing open-source products, like Umbraco, does not seem like a good way to improve karma.

There are a few other cool projects by Outercurve, like NuGet, that make sense because there are no competitors and no alternatives, either free or commercial (I'm talking about the .NET world here). That said, both projects (Orchard and NuGet) have been built and promoted by top Microsoft developers.

I can see only two alternatives:

  1. if Microsoft needs a Foundation as a way to act in stealth mode in the open source community, without raising too many eyebrows, then perhaps that's a bad move as everyone would love Microsoft to be even more open to open source (ASP.NET MVC is open-source, for example) - even so, it's a giant failure because I was convinced Microsoft was officially endorsing both Orchard and NuGet until today
  2. if Outercurve is really an independent entity (but I doubt it) then they have failed too because their work seems like Microsoft's.
All of that said, it should be noted that Microsoft is giving us a lot of highly-appreciated support for ScrewTurn Wiki and so is doing for other open source projects. They are committed to making open source a part of their business, but I believe they should be more aggressive and transparent. I'm still puzzled by open-source business models (e.g.: giving a product away for free and then selling support feels like, uhm, ransom), but I also think that Microsoft doesn't really need a business model other than putting money in open source and get a better karma in return.

In the end, it's all about developers and karma. In IT, new things emerge from the low levels. From individual developers inventing new ways of doing things, releasing tools as open source, adopting concepts from other fields (or technology stacks). So far, ASP.NET MVC, which we use in Amanuens, is the only notable answer to this movement from Microsoft. It's a giant step for a company with the agility of a frozen mammoth, but it's not yet enough. I guess they know that, but an half-assed attempt like Outercurve is not exactly a good start.

Small Changes To Boost Morale

When you have a dark mood, everything seems to go bad and weird bugs pop out from nowhere, there's little you can do to boost morale and get back to happiness.

One thing that works well for me is changing something.

Mobile Twitter client, desktop wallpaper, desk orientation,  whatever can be changed with minimum effort.

That small change gives me the feeling of freshness I need to get back to work with renewed energy.

Try it, it's surprisingly effective.

TrendMicro SafeSync

Lately I felt the need to have all my 60 GB of personal data stored in the cloud, mostly because maintaining a backup plan is a pain. Besides, I'm using an oldish PC with a dual-core processor, 4GB of RAM Windows Server 2008 as NAS, which is a bit overkill and wastes a lot of energy, even if it wakes automatically at a certain time in the day, performs the backup and then shuts down.

Between all the alternatives, TrendMicro SafeSync (formerly known as Humyo) seemed the most affordable service. Most notably, it offers unlimited space for less than 45€/year and also sports an Android client. After a whopping 30-day free trial, I decided to purchase the service as everything looked great. I was happy.

I started pushing several gigabytes worth of data, and then all hells broke loose. Cut a long story short, I got 300+ files randomly moved to SafeSync's recycle bin and, after two days of re-tries, crashes and what-not, 49 files were not getting synchronized, no matter what I did. Preemptive comment: those files were not locked by other programs.
Moreover, upload and download speed was awful. I never got anything above 300 kbit/s on a symmetric 10 MBit/s connection.

I simply cannot force myself to trust a data storage service like that, so I looked on TrendMicro support site, discovered that I could get a full refund and applied for it. This morning, not seeing any refund in my PayPal account, I sent an email to TrendMicro's support service and in less than 30 minutes I got a response. Funnily enough in the meantime I received the refund...

So, kudos to TrendMicro for their fast customer service, but that does not make SafeSync any better...

And now, off to Dropbox. Terribly expensive, but rock-solid. And it has LAN Sync too!