#1
|
||||||
|
Outage Summary May 12th
Just to give people an idea of what happened.
Sometime after 6 PM EST the data center experienced a power outage affecting the rack that our servers are in (more info posted below on this). The servers were back up a few minutes later, however it resulted in a hard drive failure in the primary database server's array, which ultimately destroyed the array and resulted in complete loss of data. The data center provided me with a KVM Over IP (Kind of like a remote monitor, keyboard, and mouse) to confirm that it could not be restored. One of the safeguards I have put in place is active real-time replication from the database software to another database server (one of our other zone servers). Both servers lost power but the second server didn't incur any hard drive issues, although it still resulted in data corruption due to an interrupted replication stream and write processes that weren't completed (The raid controllers have batteries, but the database software being halted causes issues as well). That data was run through the database repair program to fix all the issues with incomplete writes and other oddities that had happened, and it did notify me that the character data had lost 3 entries due to corrupt data. (These characters were later identified, rolled back, and compensated). Since the replicated database only holds select data (pretty much only player data), I had to first restore from the last *full* database backup, taken on May 7th, which included all necessary background information such as database table structure and content. After that, I had to overwrite that database with the more specific player data from the replicated copy, and then lastly I had to re-merge in our latest content patch. The secondary replication server then had to be reconfigured to be the new primary server, as well as continue to run the zones it had before. After all that it was just a matter of cleaning up internal configurations and etc, and getting the server booted up. After the server was up and we got through the initial rush of resurrections and got the 3 affected characters restored, I went ahead and set up the real-time replication to another separate server, in-case of another disaster :P. Figured I would give everyone a glimpse of some of the work involved. Here is what the data center said about the power failure: Quote:
Quote:
Quote:
__________________
| |||||
|
#2
|
|||
|
Flipping heck, that sounds like Japanese to me but I understand that the brown stuff did hit the fan and that, as usual, you spent a lot of your own time away from your loved ones to sort it out for the benefit of all of us. For that, I can only thank you and give you a cookie if I see you in game [You must be logged in to view images. Log in or Register.]
Petitpas/Nagash | ||
|
#3
|
|||
|
Thanks!
We appreciate all the work and safeguards you've put in. | ||
|
#4
|
|||||
|
Quote:
Quote:
Good job Rogean [You must be logged in to view images. Log in or Register.] | ||||
|
#5
|
|||
|
Thanks for all of your work to let us play this great game, Rogean.
| ||
|
#6
|
||||
|
I understand little of the more intricate tech parts, but the main gist of it explains not only how much work goes into this, but how much time and energy you guys put into keeping this server up and running.
We get an XP bonus for the failure. Do they give you a discount or anything Rogean? lol. Either way, this post should go a long way to explain why we all need to ante up and kick in some donations and/or -- ------. If not just to help keep the server running, then to show some appreciation for what you guys do. Thanks to all of you guys. P.S. OMW to Charlotte this weekend, good time for me to go in and cast Fear or Screaming Terror on 'em all [You must be logged in to view images. Log in or Register.]
__________________
~Knitemare T`Knite~
~Harbingers of Thule~ Quote:
| |||
Last edited by Rogean; 05-13-2011 at 11:45 AM..
|
|
#7
|
|||
|
Are you getting any SLA reimbursement? power outa
meh, offline discussion methinks
__________________
Vonkaar Best Druid on Project 1999 - I have a blue ribbon to prove it. Destroyer of Mooto - 7 times since 1999. The Bane of Bixies© Huge halfling balls. | ||
|
#8
|
|||
|
Anyway, that's a huge fuckup for a datacenter.
__________________
Vonkaar Best Druid on Project 1999 - I have a blue ribbon to prove it. Destroyer of Mooto - 7 times since 1999. The Bane of Bixies© Huge halfling balls. | ||
|
|
|