16:16 Thursday, March 03 2005

stumbling towards the finish line

Yet another incredibly crappy night. I should have known it was a bad sign when my phone was ringing just *before* I went to bed. More cvs stupidity from the same customer, but this time I was mentally functional enough to run the find command upfront, and it was all fixed in about 5 minutes. Then at 3:45AM, my phone is buzzing, so I stumble out of bed, go to check my email, and i get nothing. I'm still not at all sure what happened, but it looks like the mobo just *lost* the onboard IDE controller, which means that my entire / partition was effectively gone. So i'm barely aware, my phone is beeping every 60 seconds, and my computer is b0rk3d. I try to do a graceful reboot, but of course /sbin/shutdown can't be found either. So its the RESET button. But when POSTing it is timing out on the IDE controller again. So I power it off, wait 3 seconds, power it on, and i'm back in business. Almost. Crappy lame openvpn got dropped into the runlevel 5 startup, and is hanging for whatever stupid reason. I wait like 30 seconds, and still no timeout, so its the 3 finger sallute, and then interactive bootup so that i can hit the big N for openvpn. But the fun didn't stop there. I had applied a FC3 update to X the day before, and that b0rked the nvidia driver so X wouldn't start. Oddly my phone stopped beeping about 3 minutes ago, but I'm not sure wtf caused it to beep, since I can't get to my email yet. So i figure screw X, i certainly don't need that to ssh to the server in question/trauma, and I get on it, and as best as I can tell, everything is running that should be but the load is on the high side (like 20 or so, which for this box is moderately high). But thankfully it was dropping, so i'm guessing that i was awakened due to a high load warning *argh*. So once the server seems to be stable, I go back to figuring out wtf is with X. I modprobe nvidia, and try to fire up X again, but no love. So i reinstall the nvidia X driver, then X comes up. I log in, finally check my email,to confirm that the fscking server hit a load of 42 about 10 minutes earlier, and then gradually dropped off. Seeing as how i get paged when the 5 minute load average is over 27, and the 15 minute average is over 30, 42 blew all of that out of the water, and is quite honestly a record for this box AFAIK. At this point, i'm nearly wide awake from all the chaos, so I scan through email for anything else amusing from overnight. Nothing terribly exciting, just the usual coworker/customer craziness. I then try to figure out why this box's IDE controller went AWOL. /var/log/messages is 100% normal up until about 3:02AM at which point there is nothing until the box is coming back up just before 4AM (when i had rebooted it). So this truly just went into some kind of coma without warning, where the log couldn't even be written to any longer cause the disk _disapeared_ for all intensive purposes. So the box has been stable for the past 12 hours & 38 minutes, but that is a fraction of a drop in the bucket when it comes to uptime (I think the record for one of my servers at work was 348 days), so i'm certainly not getting any warm or fuzzy feelings. Part of me is very nervous, and the other part is trying to convince myself that this was some weird anomaly, like a neutrino flying through my body or something. At this point, i'm inclined to play the waiting game. Certainly if this goes to lunch again within the next 3-4 days, then i definitely will need to go the warranty RMA route which will suck royally.
In other news, I think we've found two potential reasonable candidates to replace derrick. Out of 3 interviews yesterday & today, two of them look promising. Unfortunately, my involvement kinda ended today, so who knows what will happen from here, but I tried my best to get this done.
I hauled home two large boxes of assorted stuff from work today. I'll have one more box tomorrow, and say my final goodbyes, and then 4.5 years of blood, sweat & fears are over. I can't say that I won't miss the place. Not everything was bad, but it had certainly gotten to the point where the bad was greatly overpowering the good, and i spent my time fighting too many battles and winning very few. I'll prolly have more sappy, nostaligic thoughts over the next few days.