Traditionally, when a server suffered a catastrophic failure, the sites it hosted went down and stayed down until it could be repaired or replaced. On a long enough timeframe, servers will fail — components don’t last forever, particularly those with moving parts like hard drives, but also electronic components. It was once an accepted feature of web hosting, some downtime was expected.
It’s almost impossible to guarantee zero downtime, even on the most advanced cloud platform, but modern cloud technology allows us to implement systems that make it an extremely rare occurrence.
To be considered a High Availability system, a cloud platform has to have a couple of features which are closely related to each other: the absence of single point of failure and failover in the case of a server going down.
The Single Point Of Failure
A server is a complex machine made up of many different components, each of which has to function in order for the server to function. A failure in any one of those components will lead to the complete failure of the server. Servers have multiple single points of failure. Removing single points of failure requires redundancy, so that if one component fails, others can take over.
Failover systems monitor servers so that when they become unavailable or exhibit a level of performance that falls below predetermined minimums, service can be switched to alternative servers or hardware.
The Cloud Make High Availability More Economical And Efficient
Clearly, with traditional hosting, high availability requires ever-ready redundant servers — that’s expensive because a server will be sitting idle for most of the time and servers cost money.
The cloud lets us be smarter about redundancy. Cloud servers run on top of the virtualization layer — essentially they’re software, which lets us do things with them we couldn’t reasonably do with physical servers; including replicating them, backing them up, and moving them between physical servers.
Of course, underneath the virtualized servers are physical servers, and, as we’ve already discussed, physical servers will fail. But virtual cloud servers aren’t tied to a particular physical server. If the physical server fails, failover systems will notice that it and the virtual machines it hosts are unresponsive. The cloud servers can then be moved to another hardware node with minimal downtime.
High availability cloud servers remove the single point of failure because there is always another hardware node ready to take over from a failing node, allowing us to minimize downtime without the expense of having a replica physical server sitting idle. Of course, we do have redundant systems and excess capacity, but virtualization and the cloud allows us to manage it much more efficiently than we could with physical hardware.
Image: Flickr/kevin dooley