Smart Business spoke to Bill Mathews of Hurricane Labs about what happens when the cloud fails, and how to not panic when it does.
I’ve written quite a bit on why I think business should cautiously embrace the cloud and see what happens. I promise it is not as terrible as a lot of folks are telling you but it does have its faults.
Many folks seem to think that I sing the cloud’s praises and speak nothing of its many faults. This is patently untrue. As anyone who knows me will tell you, I pretty much dislike almost everything and find fault in nearly everything — the cloud is no different. A lot of applications in the cloud have many, many issues. For instance Twitter, which runs in the “cloud,” has its own share of documented issues. From being over capacity (hello fail whale) to just simply being down, cloud failures do happen and it is not a nirvana. Gmail has developed quite a reputation (unfairly some would say) for being down. Here’s the challenge I have for you though: Measure your network and application uptime against theirs. Let me know the results.
One question I am asked: what do you do when the cloud breaks?
Lots of prayer if you’re into that sort of thing; then you really dig in. If it’s a software or infrastructure as a service you really have no choice but to wait, it’s not your code or your servers, so waiting it out is really the only option. I know that sounds terrible (and it is, believe me) but no one is more motivated to keep their systems up than these providers. Every minute they’re not up is a minute they’re not billing you, they don’t want that. Economically speaking it’s in their best interest to keep their stuff running. This may sound like common sense but there are a lot of FUD spreading folks out there basically claiming that Google and Amazon just throw things up there and put no thought into it. I’m not going to go so far as to say they put the stuff up there and never under-think anything, but chances are, if they’re putting something up for you to pay for, then they’re going to want to make sure it is available as much as possible. Availability is a big issue in the cloud, and it should be.
My advice is to measure their uptime (the amount of time a given system is available) with yours and see what the difference is. If yours is significantly higher than theirs, congratulations, you’re better than some of the biggest tech companies in the world (and you should be proud). But, if not, you should investigate it a little further. If you’re not measuring your uptime then we should have a separate conversation. The point is, don’t be dismissive. You might actually be able to increase your service and decrease your cost, and that sort of thing is truly rare.
What improvements would I like to see from these sorts of providers?
Logs, logs and more logs. Let me know what’s going on with my instance of the application — a little more truth in monitoring. If something is down, let me know so I can work around it. Don’t make me find out by hitting refresh and waiting until you timeout. Every cloud provider should have both a truthful status dashboard and an emergency broadcast Twitter account (that maybe sends to Facebook and Google+ too for good measure), when there’s an outage. The guys over at 37signals do this very well with their Twitter account whenever their Basecamp or related services are down or have other issues. It wins with their customers because they’re being up front and honest about it. We’ll be launching a few cloud based services very soon and, believe me, this sort of approach will be baked in.
My overall point isn’t to be a giant cheerleader for the cloud — it doesn’t need me to do that — but to get smart and good people to lay down their fears and try something new. A lot of these folks can bring a lot to the various realms of cloud security and can help make massive improvements. Instead of saying “No, no, no,” I’m just looking for an “Okay, let’s try it out and see what happens.” Is that too much to ask?
Bill Mathews is Lead Geek of Hurricane Labs, an IT security services firm founded in 2004. He has nearly 20 years experience in IT, 13 of that in information security, and has been interested in security ever since C3P0 told R2D2 to never trust a strange computer. He can be reached at @billford or @hurricanelabs on Twitter, and other musings can be read on http://blog.hurricanelabs.com.