If there is one thing Sandy demonstrated it is that you cannot make your DR plans too robust. There are always unplanned for eventualities when you approach your planning from the perspective of events that could occur.
In my opinion, a more comprehensive approach is to look at what you need to continue to operate and base your disaster planning on getting those critical facilities operational again as quickly as possible.
Horror stories from NYC
In New York City, there are a surprising number of data centres and two were taken out of action by Sandy. Internap was completely engulfed. In the basement the fuel pumps were knocked out by salt water and the fuel tank by water in the breather pipe.
Their header tank ran out of fuel after 12 hours, and they had to go offline for 12 hours while they waited for privately hired tankers to come in from 200 miles away. Even when the fuel arrived, they had to rig up a system to get the fuel upstairs into the header tank.
They then had to operate in this way on generator power for 10 days while they waited for mains power to be restored. Another company nearby, Peer1, had a similar problem and their employees were carrying fuel in buckets up 18 flights of stairs to keep their generator going.
These businesses were lucky to be able to get their hands on the fuel, given the critical shortages in supply. These examples show admirable commitment to getting services back on line, but also fundamental flaws in their basic infrastructure and recovery planning.
Is the basement safe?
These examples, and many others, showed the danger of locating critical equipment in the basement, where it is often placed to protect from other potential risks. There isn’t an ideal solution, but thorough DR planning should take more account of the risk of flooding, even if from the type of weather we have been experiencing here in the UK over the last week and earlier in the year. You don’t need a full-blown hurricane to cause an outage!
What can we learn?
Some lessons to be learned, for in-house and outsourced facilities, would include:
- Model your business operations – for business as usual and in the event of a disaster
- Plan for critical equipment failure, rather than what happens if there’s a flood, a fire, a terrorist attack – the scenarios become too numerous to plan for adequately
- Identify where potential weak points are, identify what makes them weak and set up alerts when their status changes (e.g. scheduled generator maintenance, audits of remote back-up facilities)
- Keep plans and audits up to date – part of the problem after Sandy was not just inadequate DR plans, but also failure to execute plans quickly enough
- Have back-up facilities available in a different area less likely to be affected by the same event and managed on a different power grid