18:26 Monday, June 06 2008

hardware hates me

Today was not at all fun. Yesterday morning, I checked email from work to discover that several systems, all in the same rack, suddenly fell off the network on Saturday night at 10PM. I figured that there must have been some kind of power outage. Sadly, I was mistaken. I came in this morning to find all the systems in that rack powered up, and humming along. Rather than the power going out, the network died on that rack. At first, I thought that perhaps the switch failed. I tried power cycling the switch, and that seemed to help for about 20 seconds initially, but then everything once again fell off the network. I was almost convinced that the switch was flaky when I noticed that the LED for one of the systems on the switch was flashing so fast that it was almost on solid. I powered down that system, and suddenly everything came back on the network. So it wasn't the switch, but a system that had gone off a cliff, and started flooding the network with a broadcast packet storm. I got that temporarily under control, and moved onto the next fire.
A few weeks ago we ordered a bunch of Apple hardware for future test growth. One of the systems is an Apple Xserve. We received them about 2 weeks ago, but I've been so busy with many other things that I didn't get to unboxing them until today. I started with the Xserve, since I had never actually used one before. The thing wouldn't POST (or whatever it is that Apple hardware does during its initial self-test process). It powered up, the fans whirred, the LED's flickered, but no video at all. After reseating the RAM, and cursing at it, I ended up calling Apple's support. The guy that I spoke to was very friendly, and well meaning, but seemed to be as perplexed as I was. The first thing that he had my try was to reseat the mezzanine card. This card is about 4 sq. inches, and plugs into the motherboard, with this wacky looking proprietary slot going out the back of the case. In order to reseat it, I had to remove 6 different screws. Sadly, this process didn't make any difference. Next he suggested that I just plug any random PCI-E graphics card into the system and try to use that instead of the mezzanine card. I questioned why this would work since the Mac workstations will not work with any random graphics card, since they required the Mac EFI, but this guy insisted that it should work. Alas, it did not. Next he suggested that I use something called "Apple Remote Desktop (ARD)" to connect to the system. Besides the fact that I don't have ARD, I kinda needed to know the IP of the system for this, and since this was the first time I had booted it up, I had no clue what its IP address was (or even if it was booting far enough to get one). Surprisingly, this system had a real serial port. After quite a bit of trial & error on the speed of the port (the guy didn't know what it was), we finally discovered that 56.7k is the default speed of the serial port, and with that (and a handy "screen /dev/ttyS0 56700" on my Linux system, I got a tty. Apparently the default root password is the serial number of the server. Apple is overnight'ing me a new mezzanine card, with the assumption that its the bad component. I'm not holding my breath.