The American Caliban (substitute) wrote,
The American Caliban

Les nonnes se sont rasé les sourcils, et se sont jetées dans un puits les unes des autres

Wikipedia just broke, the same way LJ did: power outage, plus MySQL/InnoDB. Two cheers for LAMP and no cheers to whomever hosted that thing.

What happened?

At about 14:15 PST some circuit breakers were tripped in the colocation facility where our servers are housed. Although the facility has a well-stocked generator, this took out power to places inside the facility, including the switch that connects us to the network and all our servers.
What’s wrong?

After some minutes, the switch and most of our machines had rebooted. Some of our servers required additional work to get up, and a few may still be sitting there dead but can be worked around.

The sticky point is the database servers, where all the important stuff is. Although we use MySQL’s transactional InnoDB tables, they can still sometimes be left in an unrecoverable state. Attempting to bring up the master database and one of the slaves immediately after the downtime showed corruption in parts of the database. We’re currently running full backups of the raw data on two other database slave servers prior to attempting recovery on them (recovery alters the data).

If these machines also can’t be recovered, we may have to restore from backup and replay log files which could take a while.

Is there an echo in here?
  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 1 comment