Adam Jackson ([info]ajaxxx) wrote,
@ 2009-09-30 12:54:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Current music:kmfdm - diy
Entry tags:x

On Portland
The Portland State University machine room - host of annarchy, aka people.freedesktop.org and the storage for personal git repos - had a rather catastrophic series of power failures yesterday. As in, both of the independent, UPS-backed power rails to the room failed simultaneously, multiple times. As best we can tell, one of those failures was during ext3 journal recovery of /home from one of the previous failures. fsck was utterly, utterly unable to cope with this.

So, /home got a reformat. /home on that machine was not backed up (and never has been advertised as permanent storage), so anything you had there is lost. Git repos can be restored from your own clones, of course.




(6 comments) - (Post a new comment)


[info]neillparatzo
2009-09-30 09:21 pm UTC (link)
We had a similar incident at Longtail with reiser3 which, luckily, a --rebuild-tree solved. But I'm avoiding reiser3 on new installs because of it. Should I be avoiding ext3, as well? I thought the definition of a journaled filesystem was that it is always consistent, regardless of interruptions. So what the hell?

(Reply to this) (Thread)


[info]fooishbar
2009-09-30 10:40 pm UTC (link)
It's the 'multiple times' bit which is the kicker. It's a rare filesystem that will survive power failure in the middle of journal reply, followed by power failure in the middle of fsck, followed by ... etc. There were a lot of unhappy-looking servers in the machine room: at least one machine per rack had a physical indication of failure (e.g. Dell machines flashing yellow instead of solid blue), not to mention the machines which had come up but with a lack of services, etc.

(Reply to this) (Parent)(Thread)


[info]neillparatzo
2009-09-30 10:43 pm UTC (link)
I thought "always" meant "always", even during journal replays. I guess I'm wrong.

(Reply to this) (Parent)(Thread)


[info]fooishbar
2009-09-30 10:49 pm UTC (link)
Technology is rubbish, news at 11.

(Reply to this) (Parent)


[info]laptop006
2009-10-03 12:10 pm UTC (link)
By the time you've got a rack of Dell machines there's usually at least one complaining about some stupid thing that's not actually a failure.

To be fair HP gear does too, but far less often.

(Reply to this) (Parent)(Thread)


[info]fooishbar
2009-10-04 01:46 am UTC (link)
All our HP kit was physically sound, thankfully. Then again, the iLOs are shot, so eh.

(Reply to this) (Parent)


(6 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…