HSL Disaster Recovery
In Musings 2014 #2 we talked about disaster recovery. Let’s talk about actually having to use the plan.
HardSoft Systems Ltd has been in business since 1984 and we have had many experiences with client disasters of varying degrees (unexpected server issues, a water tank bursting right above a computer room, a natural gas leak making a client evacuate their building for a day), however, we have never had to fully test a long term disaster with a recovery plan till this winter.
During the *VERY* cold weather this February, a water pipe burst outside the 12 story building where one of our clients’ has their offices. Initially it was no problem, there was water in the parking lot etc but with them being on the 6th floor, we were not too worried. However, when the city could not turn off the water at the first valve (it was frozen shut) or the second, then it took more time for the city to find the third valve…. enough time had passed that water, carrying gravel jammed the sump pump in the underground hydro vault that supplied the building. This filled up the vault shorting out the transformers and causing a blackout in the immediate area.
Now, a power outage isn’t really that big of a deal in a large metropolitan area, it is a bit inconvenient but we expect that power will be restored fairly quickly. So, initially none of us were worried. Since it was near the end of the day all the staff were sent home. Managers worked out a notification system for when power would be back on and who would come on site to restore power to servers etc. So far so good. Some hours later, Hydro was restored to the area but not to the building. There were now 2 problems, one was the formerly sealed transformers actually had water inside them which meant they needed to be replaced, the second was that the water in the underground vault was freezing solid. Our client was given the bad news, the power was not going to be restored to the building anytime soon.
The situation was very interesting from a number of standpoints;
|1.||Their office was physically OK, just inaccessible (no lights, elevators, heat, water etc) and temps were -20c in day and -32c at night.|
|2.||They did not have a timeframe for restoration of services.|
|3.||Approx 60% of interaction with their clients was via a portion of their web site which they hosted on site that was now down.|
|4.||They had no way to answer phones as their phone system was supplied by traditional telco copper and an onsite PBX.|
|5.||They were headed toward a monthly critical processing time.|
|6.||Their remote data entry employees were unable to do their work.|
The questions for management were;
|1.||How to communicate to their clients.|
|2.||What to communicate to their clients.|
|3.||What to tell their 45 employees.|
|4.||What ‘level’ of disaster recovery to start to implement.|
The factors on their side were;
|1.||A disaster recovery plan that had been developed cooperatively with Senior management and HardSoft Systems.|
|2.||Business interruption insurance.|
|3.||Committed, flexible staff.|
According to the disaster recovery plan, they had called HardSoft Systems as soon as the power first went out and so we were in the loop from the beginning. Since the critical monthly processing deadline was approaching, it was decided ( in a series of conference calls) that a limited relocation of the business be undertaken. So, HardSoft Systems personnel got permission from the building management to get into the suites long enough to remove 2 critical workstations and 3 of the critical servers. We also brought switches, UPS’s, monitors, keyboards, power bars etc from the HardSoft offices as it was bad enough carrying heavy servers down 6 flights of stairs. Our client had rented a meeting room at a nearby hotel for us to set up in …. so we set up a mini network in the conference room and were able to do the critical month end processing.
We were still waiting for Hydro to give us some indication when the transformers could be replaced but so far all we had been told was ‘between 3 days to 2 weeks’ which wasn’t much good for planning. So, on the 2nd day of the outage, the decision was made to fully create an offsite office for 50% of the employees. The conference room next to the one that we were set up in was rented and HardSoft was asked to make it happen. We started by making sure that their client web portal was re-directed to the temporary location, we then called one of HardSoft Systems’ suppliers and arranged for rental workstations to be available (this was much better than carrying equipment down stairs one piece at a time) and lined up HardSoft Systems staff to do the install/ config / setup and test. Also, we made sure that the remote staff were able to remote into the temporary office so they could start their work.
By 9:00 am the next morning, we had a fully functional office in the 2 conference rooms, servers, 18 workstations (with all the applications the staff required for their work loaded and functional), 2 printers, 1 MFP, 2 business phones, etc were all connected, joined to the domain and fully able to support the staff in their jobs. HardSoft Systems personnel were on hand to help with any questions or problems. In addition HardSoft Systems personnel stayed in the hotel each night in case there were any issues after hours. It was a good thing too, as on night #2, the power went out in the Hotel! (yes this is all true, we can’t make up stuff like this), so the HardSoft Systems staff that was on site was able to go down to the conference room, shut down the servers gently and, once the power was back on 3 hours later, bring everything up and make sure all was well before the staff got in the next morning.
Days later, once Hydro replaced the transformers and once the building’s services were restored and it had heated up, we were able to move the 3 critical servers and 2 workstations back into the offices quickly (much easier with elevators) and they were then back up and running.
So why tell this tale? There are a few reasons, one is that having a disaster recovery plan in place along with business interruption insurance meant that this 89 year old business is still in business today. Another is that you want to partner with an IT company like HardSoft Systems that not only can help you develop a disaster recovery plan but has the experience with the successful execution of a disaster recovery plan. Another is that the effects of extreme weather is rapidly becoming the most common cause of business interruption so we need solid plans to deal with these interruptions to preserve the business you have worked so hard to build.
Call HardSoft Systems today at 1-800-263-8433 FREE and we can get started on your plans or, if you have already have a disaster recovery plan, we can review it and suggest improvements based on real life experience.