At approximately 11:20AM NZ Time on the 4/02/2014, there was a major electrical surge into the datacentre where all of our CPanel servers are stored. This resulted in the datacentre losing power for approximately 2 seconds. As this is a normal concern for our industry, our upstream providers have both UPS and generator backups to mitigate any potential effect to clients, and we've never had any issues previously. The facility has gone on strictly UPS power many times, both for our quarterly/yearly testing, as well as regular power surges.
This issue however, proved to be much more problematic, because there was also surge that occurred after, which took out multiple redundant/diverse path breakers in the datacentre. This resulted in a full scale blackout at the datacentre. By 11:21AM NZ Time, utility power was restored. Our upstream providers have UPS systems with multiple battery cabinets to give clean conditioned power to the servers, network equipment, and the like. The breakers however kept on tripping, rendering the UPS system, and multiple battery cabinets useless. Our providers have the capability to take the UPS system out of the loop, with both internal bypass panels and external maintenance bypass panels. They purposely built it that way with the utmost redundancy in mind, but they didn't feel comfortable running on just utility or generator power, having just had a major full scale power outage into the datacentre facility. Our provider therefore immediately had their vendors brought in to assess the situation and the best course of action.
By 12:15PM NZT, our providers had 98% of the systems back online. They lost a major aggregation switch due to this issue, among a bunch of other switches, but have backups of everything, so our network operations team also had to restore backup images to backup switches on the routing/network side of our operation as well. By 12:45PM NZT 99% of the systems were fully functional. Given everything that happened, it was amazing to get just about everything back up in less than 2 hours.
The datacentre ontinued to be on generator/utility power before one of our UPS vendors at 6:30PM NZT switched the UPS systems back in line. All members of the datacentre team were onsite with full hands on deck during this outage. The network has been going strong ever since, and we sincerely apologize for any issues that any client(s) had. All websites we fully restored within a couple of hours of the power incident occurring, and secondary services such as CPanel access, were restored not long after that. We always strive to provide each and every one of you with the highest levels of redundancy, coupled with the highest levels of support in New Zealand. This is backed up by the fact that this is the first type of event like this since we started in 2001. We do believe we are providing some of the best quality web hosting for the New Zealand market. However ultimately we are here to provide a platform of stability, and it failed for over 1 hour. Freak incidents can and will unfortunately occur, but our providers were fully prepared for this power issue, (even though it was essentially outside of anyones control), having backups of servers, switches, raid card, system administrators, network engineers, everything needed to resolve this problem as fast as humanly possible.
We do not take this type of issue lightly, and our upstream providers are working with their vendors to see what other steps we can take to make sure something like this never happens again. I'd like to personally thank all of our clients who tried to the best of their abilities to stay calm as we pushed out updates the whole time throughout this event on our Network Status page, as well as replying personally to emails whilst it was being fixed.