INTERNET APPLICATION DEVELOPMENT
MID MARKET ERP DEVELOPMENT
by Brian Terrell
This blog title appears overly dramatic; however, I received this exact email from a CodePartners' client recently:
The application is hosted on the Amazon cloud, as you perhaps remember. Apparently Amazon has lost the server that the application was on. Yes, they physically lost the server so the app is no longer accessible. We have been asked to reload the application once a server is available. Please let me know what, if any, the cost would be to reload the application.
Many may believe the cloud obsoletes clients' responsibilities for ensuring fault-tolerance. Fault-tolerance enables a system to remain operational, despite a software or hardware component failure. At the very least, it includes the ability to recover data when a hard drive goes kaput. Imagine the sickening feeling of not being able to boot-up a notebook computer and realizing that mission critical intellectual property exists only on that machine. Yikes!
So, it's natural for some to assume that provisioning delegation of hardware, software, power, and connectivity to a hosting provider means I have also delegated fault-tolerance. Unless my agreement with the hosting provider specifically includes fault-tolerance management, then I may be disappointed when (not if) a hosted component fails. I have to specifically ask and pay for fault-tolerance when deploying to the cloud.
Actually, the topic is much larger than just my agreement with my hosting provider. Many factors combine to create optimum fault-tolerance. Amazon offers additional web services that provide for automatic fail over, geographical hosting diversity, load balancing, database engineering, and many more essential capabilities. All of these services combine to allow Amazon to deliver the highest level of fault-tolerance possible. In addition, application design plays a key role in fault-tolerance, so ensuring the business analyst understands the need for and design techniques of fault-tolerant systems is essential. Finally, subscribing to a simple online backup service to store critical data and code from one hosting provider to another on a scheduled basis adds peace of mind as well as the ability to quickly recover data in the event of component failure.
In the real life example above, our client's application includes no data, and we keep multiple backups of completed code. Therefore, they re-provisioned an Amazon machine instance, and CodePartners restored the application quickly. In addition, the client will subscribe to additional Amazon capabilities directed towards providing fault-tolerance. Regardless, the experience reinforces a required understanding when leveraging either cloud or on-premises technology: it is a question of when, not if, a technology component will fail. I must plan ahead by including fault-tolerance into the design and deployment of any technology component.