Clean Up Your Mess
« ChiPy at Google
» 344 Books Must Go
Code: Arkeia, break, Cambrian House, crash, jerk, system administration, wedge
Too many sysadmins is a bad thing, especially if one of them doesn’t care about keeping the servers up.
The development box at work wasn’t letting me check anything into subversion — commits were just sitting there, not even timing out. In fact, so were updates. Something was seriously wrong.
I talked about it with a coworker and went to look at the box. Afew a few minutes of poking around, the problem became clear: someone installed a backup program that was trying to do some kind of fake-filesystem and wedged the box. Any process that tried to read from disk froze and couldn’t even be kill -9‘d.
And thanks to this odd little behavior, I could see three reboot processes frozen, presumably trying to read the shutdown scripts. So the person that wedged the box knew they wedged it but they just left it that way.
I got the coworker in the office to pull the plug on the box and it came up OK, but I edited /etc/init.d/arkeia to spit out the following note instead of try to start the backup program:
Dear whoever the hell decided to install arkeia:
You left the dev box wedged overnight, wasting at least an hour of two coders time to figure out what you did and fix it. And we know that you know you broke it, we could see that you tried to reboot and then LEFT IT FOR SOMEONE ELSE TO DEAL WITH rather than actually fix it.
Don’t be a jerk! Clean up after yourself!
Please talk to Jim and Harkins and explain why you left the box broken before you try playing with arkeia and wedge the box again.