So I promised I’d tell you why my websites were offline for a couple days.
My server is a leased server that lives in a data centre in Houston, Texas that is owned by The Planet. On Saturday evening there was an EXPLOSION in the underground conduit that brings all the power into the data centre where 9,000 servers are housed. The explosion blew out 3 interior walls around the electrical room. Thankfully no one was injured. I chose this data centre because it has security, fire suppression systems, and back-up power generators. The fire department on the scene refused to authorize the start up of any back-up power generators because the explosion and fire was electrical in nature.
No one plans for a situation like this one. Who would have thought of an explosion!?! I do have my own back-up plan in emergencies. Every night I zip up all the accounts on my server and back them up to a server in another state. I knew, even if the servers were all damaged in Houston, that I would be okay because I had all my data up to the night before and I could purchase another server and restore it.
It was not clear during the beginning of the outage just how long we would be without power to our servers. We did find out early on that all the servers appeared to be fine and hadn’t incurred any physical damage.
I debated on restoring my sites on a new server, but if the outage is short, by the time I had everything set up to go, the DNS changes submitted, I might cause myself more down time by switching to another server. There is also the problem of having 2 versions of a database. Many of my websites run with a database back-end. Having 2 versions of a the same database on 2 servers is a horrible data management scenario.
I soon found out that my server was in Phase 2 of the data centre, and that is was actually on the second floor. This meant that my server, and the other 6000 on the top floor, would be the first back in business.
In a data centre, you can’t just switch on 6,000 computers as soon as the power kicks in. Because the electrical infrastructure coming into the building was destroyed, we would be relying on their built in generators. First that power has to be tested so that it is stable and clean power. Surges would damage more equipment. Then they have to restore the air conditioning systems. There is a lot of heat generated by 6000 servers! By 4am on Monday my server showed signs of life. It was up and down a bit as they restored power to the network, but my server started up fine with no issues. About 10% of the second floor had some troubles restarting. Sometimes computers don’t always restart well when their power had been abruptly cut off.
My server is still running on back-up power from their generators. It is being refuelled twice a day I hear. The 3000 servers on the first floor are taking longer to get started up due to the damage to the infrastructure. They had to use an external generator that turned out to be faulty so a new one had to be brought on site. Now the electrical conduit and electrical room has to be rebuilt before we can go back on regular power.
Some people are ranting and raving and making real fools of themselves. This wasn’t something predictable or scheduled. The world won’t end if my sites and my clients sites are down for a couple days over a weekend. If my sites were more important, then I would have to have a full back-up server in another facility that didn’t just store my nightly zip copies. It would have to be running a carbon copy that was synced to the first one so any changes and updates were replicated to the back-up server. It might be something I consider down the road, but not right now.
The company that monitors my server for me, HostGator, is rumoured to be providing an entire month free because of the downtime incurred. I am passing this along and giving my few hosting clients a free month for any inconvenience.
Many people are ready to jump ship to another company. I think the opposite. What data centre would be more prepared for an emergency after this than this one? I think I’ll stay.
Leave a Reply