Log in

activity wholist changelog info go back go back go forward go forward
On Portland - now if you've got a pair of headphones...
you'd better get 'em on and get 'em cranked up
On Portland
The Portland State University machine room - host of annarchy, aka people.freedesktop.org and the storage for personal git repos - had a rather catastrophic series of power failures yesterday. As in, both of the independent, UPS-backed power rails to the room failed simultaneously, multiple times. As best we can tell, one of those failures was during ext3 journal recovery of /home from one of the previous failures. fsck was utterly, utterly unable to cope with this.

So, /home got a reformat. /home on that machine was not backed up (and never has been advertised as permanent storage), so anything you had there is lost. Git repos can be restored from your own clones, of course.

music: kmfdm - diy

pong! (x6) || ping?
neillparatzo From: neillparatzo Date: September 30th, 2009 09:21 pm (UTC) (link)
We had a similar incident at Longtail with reiser3 which, luckily, a --rebuild-tree solved. But I'm avoiding reiser3 on new installs because of it. Should I be avoiding ext3, as well? I thought the definition of a journaled filesystem was that it is always consistent, regardless of interruptions. So what the hell?
From: fooishbar Date: September 30th, 2009 10:40 pm (UTC) (link)
It's the 'multiple times' bit which is the kicker. It's a rare filesystem that will survive power failure in the middle of journal reply, followed by power failure in the middle of fsck, followed by ... etc. There were a lot of unhappy-looking servers in the machine room: at least one machine per rack had a physical indication of failure (e.g. Dell machines flashing yellow instead of solid blue), not to mention the machines which had come up but with a lack of services, etc.
neillparatzo From: neillparatzo Date: September 30th, 2009 10:43 pm (UTC) (link)
I thought "always" meant "always", even during journal replays. I guess I'm wrong.
From: fooishbar Date: September 30th, 2009 10:49 pm (UTC) (link)
Technology is rubbish, news at 11.
laptop006 From: laptop006 Date: October 3rd, 2009 12:10 pm (UTC) (link)
By the time you've got a rack of Dell machines there's usually at least one complaining about some stupid thing that's not actually a failure.

To be fair HP gear does too, but far less often.
From: fooishbar Date: October 4th, 2009 01:46 am (UTC) (link)
All our HP kit was physically sound, thankfully. Then again, the iLOs are shot, so eh.
pong! (x6) || ping?