The Importance of Redundant Systems

Planning for the worst in IT is important. Quite often people can’t grasp the importance of IT on their processes. It’s amazing the number of times I’ve talked to someone who swears they don’t use their computer that much for their businesses, only to have them five minutes later complain because their email is broken. Having a redundant IT infrastructure is important for many businesses, and uptime is an important part of that, just as much as downtime (I’ll explain later).

Hope for the best, plan for the worst

When planning for IT there are many external factors to take into consideration:

  1. Power Failure
  2. Network Failure
  3. Equipment Failure
  4. Internet Connection Failure
  5. User Error
  6. Configuration Error
  7. et al.
Because IT is reliant on so many factors we need to consider which parts we absolutely need, and the parts we can go without for a bit longer. If we take away configuration or user error (I just threw them in there to make you aware of those factors as well), we are left with a bunch of errors that we generally have an external service provider for (i.e. something out of our control). The simplest solution for #1 is to use a UPS, or back up power solution. I generally have a simple formula to work out how much to spend for UPS equipment based on a number of factors:
  1. Expected downtime (minutes)
  2. Desired uptime (minutes, how long we want to keep working)
  3. Connected equipment (phones/computers/laptops/speakers)
  4. Charge out rate
I’m sure my accountant loves me (god I hope he’s not reading), but I’m pretty harsh when it comes to getting a return on my investment. If we use the example above…
  1. 20 minutes
  2. 30 minutes +
  3. One Computer (~400 watts)
  4. $120/hr (I’m using a general rate here)
So in this example we’re expecting at least 30 minutes of continuous work during our power failure, and we have a pretty decent computer you would need at least a 700VA UPS (there’s an nice calculator at APC) which could retail around the $NZD250 mark (plus GST). In my experience these will give almost an hour of usable time, but I’ll base it on 30 minutes to keep it easy, this means to get a good return out investment we would expect 4 x 30 minutes of power failure to break even (well just a fraction more). If we assume we get 2-3 years out of our UPS under good conditions, we’re likely to see at least cover that investment, and depending on what part of New Zealand you’re in (or the world), this is likely to return quite nicely. NB: This doesn’t factor in any of the other elements such as frustrated employees/clients/customers and whoever else you need to keep happy, so it’s probably worth the investment.
Next on our list is Network Failure. This isn’t very common, and is a really bugger to get going fast. Having a great supplier of network equipment (such as switches) will be your best insurance here, unless you have the budget to have some redundancy in your network infrastructure.
Equipment failure can be catered for in a number of ways. If you’re a large enough entity you could have a spare workstation, or a communal workstation which may generally be used little, but for another purpose, but could double as workstation for a few days. If you’re a little smaller you could look at leasing equipment (I’ll probably go into the pros/cons of this in another post). If you’ve read my post about Standardizing Your Hardware, you’ll know that if you purchase well supported hardware it’s generally not too much of an issue, but still be aware of it.

Internet Failure

This really does need it’s own heading (and is the main reason of this post). The Internet would have to be one of the least understood parts of IT, mostly because it is a little complex, and takes a bit of explaining (I don’t claim to be fully aware myself, but have a pretty good idea).

The problem with Internet faults if no company (or person) really likes to take ownership of faults, it’s easier to blame something, or someone else. This is partly true, because usually there are so many links in the chain to get your broadband connection to your door, almost anything can go wrong. Most people think they’re connected to the Internet, and that’s it, but in reality you’re probably connected to your ISP and bouncing all through their network even before you’re online! What makes this tough to understand is, often your ISP is purchasing their Internet connection from someone else (this is an upstream provider), and even more commonly (or should I say HOPEFULLY), they purchase a connection from different providers to provide some redundancy. What this means for you (the end user), is that if one of their connections fails, you should hopefully experience none (or minimal) connection failure.

One thing I’ve found to be true lately is often your providers network is really strong, and their upstream provider (or connection) is the part that’s broken. What makes this tough is it’s really hard to get some redundancy in this situation if they don’t have a redundant connection themselves, and the only way you’ll find out is ASK, and hope they’re not having you on.

So how do you get redundant Internet? Well you can do many things (which is great!). One of those things is to talk to your ISP about your requirements, and see if they have guaranteed service in form of an SLA (service level agreement), which guarantees whatever you’re asking for. This will cost you a premium for this service, so if pays to work out exactly what you need. The other option, (and possibly cheaper) is to get a second connection to your office. Depending on how important your reliance on the Internet is, you may want a fixed line (DSL) or Wireless connection. There are various rates available, and some may have ‘on demand’ options available to you, where you pay a fee for the connection, and when you need to use it, you pay as you go.

It gets worse! There are other services you may rely on which aren’t just connection oriented. I’m mostly talking about email, but there are other services you may need or use. Where your email is hosted is important, and there are many considerations:

  • SPAM Filtering
  • Virus Filtering
  • POP/IMAP Access
  • Webmail Access
  • Uptime
There are a range of services available, I use a solution we created which does a pretty fine job. My client Totali currently filters around 120,000 messages a day across multiple sites using SPAM Killer. SPAM Killer does what it says, and has yet to return us a false positive!
I think that about covers planning for some redundancy, but there is one last thing I’ve just remembered. Downtime!

Downtime

Downtime is an important part of uptime, odd as it sounds. What this means is to keep the uptime at a suitable level, you need at least some downtime, but the downtime I’m talking about is planned. What this means is you schedule planned maintenance to you systems or infrastructure (usually during a low demand period - i.e. the weekend) when you can apply security updates, make adjustments, or install new equipment. This means that you’ve planned, tested and implemented your solution with as little disruption as possible and should mean your uptime is at higher levels long term.

If you would like to know more about planning for uptime by planning downtime to keep your systems tip top, give a call at Totali, were our only real downtime is when the power is off longer than we planned for, and was out of our control. Keeping your systems working well will help you add more to your bottom line, and keep your staff happy (plus you’ll probably sleep better at night).

P.S. Check out SysAdmin Day where they explain what it’s like from the opposite side of the IT coin (I’m sure you’ll laugh, but I can assure you it’s 100% accurate!).


About this entry