Sites classified as highly available are designed to be ultra-resilient and offer the best protection from single failure points and underlying platform errors. The umbrella of availability incorporates failure, recovery and site resilience. Availability is commonly measured based on the time percentage that any system is working and active.
In order to improve availability it is necessary to put in place comprehensively engineered solutions. Since prediction of when or how a system will fail is notoriously difficult to predict, the best method of planning improved reliability is to engender shorter recovery times. If a given system is able to recover from a failure within 86.4 seconds then it’s possible to have a failure on a daily basis and still provide an availability of 99.9% to users.
Conversely, availability can also be viewed as the ability to successfully complete transactions. For example, if a site is handling 100,000 requests every 24 hours, then an availability percentage of 99.9 would equal 100 failed requests every day, which is not an impressive figure. The requirements for planning availability solutions can vary if you use this method as a means for calculating availability. However, in this scenario varying the request traffic can be a viable solution.
Work Out your Availability Requirements
Constructing a highly available site can become progressively more expensive the closer you get to 100% availability so it’s necessary to make compromises and trade-offs in order to successfully manage your budget. To calculate the possible benefits of implementing a particular failure strategy it’s necessary to look at the usage profile and failure impacts of the feature you are focussing on.
For example, a profiling system (feature) which is 3% write and 97% read (usage profile) could lead to a lack of personalised content and unavailable customer authorisation so that only anonymous users can shop (failure impact), which in turn reduces availability.
One way to tackle this problem is by utilising Active Directory or SQL Server, which both provide authentication for larger sites and are able to support anonymous shoppers.
Site Failure Prevention
Site failures can be placed into three different categories; software failure, hardware failure and human error. If adequate planning is not put in place any of these categories can cause a site failure. Some of the most common areas where problems occur include:
- Servers
- Security
- Network
- Hardware
- Electrical power
- Data
- Climate control
- Application software
So it’s necessary to investigate all relevant areas in order to achieve maximum availability. Investigation can be carried out in house but the majority of larger sites are now choosing to employ independent management companies to investigate potential problems and offer up viable solutions.
This is particularly important if in-house IT staff or those in the company responsible for site maintenance do not have the required understanding of how to assess and improve availability. In addition, it’s essential that any changes are fully tested in a safe environment before being rolled out, which is something that an availability expert will have in depth knowledge of.
In conclusion, in order to future proof your site and ensure it offers maximum availability, leading to maximum revenue, the best recommendation is to employ an expert management company to thoroughly investigate your site and provide the most up to the minute availability solutions.
Featured images:
- License: Royalty Free or iStock source: http://www.sxc.hu/browse.phtml?f=download&id=1410259
Simon Western is a fully trained computer programmer and IT technician. He enjoys sharing his computer knowledge on the bloggosphere creating how-to guides and tutorials. When he's not blogging, Simon works at ARC Systems- providing IT support and solutions to businesses in Essex and the south of England