I know I am 11 days late, but I only could find the time to write about this earlier. I wanted to write about backups and disaster recovery but what the heck, lets just bring the story.
Ma.gnolia was/is one of the biggest, famous bookmarking services. I was updating populair.eu (soon!) when I noticed a frontpage quite different. When reading it I realized the owner, Larry Halff somehow had a crash of the 500GB database and didn’t had time (its a 2 man operation) ….to implement a professional backup and recovery system.
And when file corruption is in play…well…. that can be very nasty! (it will sit in your backups over time and slowly … very slowly pop up its nasty head). But I understand his reasoning. These are always things in the back of your mind that you hope that don’t happen.
But when they do… they will stress you up so badly that you probably will have to work 24hrs around the clock to make straight lines from 500gb of spaghetti again. I’ve read these horror stories also on real-life companies who completely trust all their source code to some non enterprise (open source) versioning system… it works as long as hell does not break lose. If it does, you will regret a lot of things, it will cost an awful lot of money and in the best scenario your project or even company won’t go broke.
Ma.gnolia experienced every web service’s worst nightmare: data corruption and loss. For Ma.gnolia, this means that the service is offline and members’ bookmarks are unavailable, both through the website itself and the API. As I evaluate recovery options, I can’t provide a certain timeline or prognosis as to to when or to what degree Ma.gnolia or your bookmarks will return; only that this process will take days, not hours.
Some background: Ma.gnolia’s database server suffered from file system corruption, which also corrupted it’s database backup, even though it was on a separate system. This much was bad luck. I was relying on a single backup; the database was fast approaching half a terrabyte and I had been unable to implement a practical, economical solution to version that quantity of data. Having a more robust and comprehensive backup system in place was my responsibility; and, believe me, I know that I let you, the Ma.gnolia community, and myself down in that.
I am currently working with a data recovery company in hopes that they can recover a working version of the database. I will update this thread, our home page, and Twitter account as I hear from them, which unfortunately could be as late as next week.
get satisfaction thread: http://getsatisfaction.com/magnolia/topics/bookmark_recovery_tips
recovery tips for ma.gnola: http://recovery.ma.gnolia.com/
This was a comment I read on the satisfaction thread which is hard, but is correct:
As for criticisms of how this is personal, no, it isn’t. I have no idea who Larry is and I had no idea that Magnolia was/is a one man band. That he may "feel bad" isn’t my concern right now. I’d rather that he wasn’t going through this but he is. I didn’t bring it upon him. He did.
Larry should be encouraged all right. He should be encouraged to leave system administration to professionals in the future! It would have cost under $400/month to have a proper backup in place, and that’s with having a full daily, weekly, monthly backup of the 500GB database. Larry could have paid for three generations of backups for four years for less than what this episode is going to cost him, assuming this isn’t going to cost him "the business", such as it was. It could have been done even cheaper than that if he had seeded the original backup using "sneakernet", stored the backups on a computer at home, and just transferred the deltas from the original on a daily, weekly, monthly basis. This disaster wasn’t caused by Larry being unlucky. It was caused due to his neglect, lack of clue in disaster recovery procedures, and hubris.
Also read wired: http://blog.wired.com/business/2009/01/magnolia-suffer.html
One thing to keep in the back in our minds:
"Cloud computing becomes fog when it goes down," says Todd Spragins in a Twitter post.
So… don’t assume anything with free online services:
Since there was no claim made (at least not that I noticed) that my data would be kept safe and sound, I don’t feel like we have much of a right to complain. I was nonetheless pretty shocked to learn that there was no backup made that could still be accessed. I just sort of assumed that a project of this nature would be using offsite backups of some variety, which I’m gathering was not the case.