18:12 Wednesday, January 01 2008

goings on

In our test farm at work, we have a decent number of system (9 floor to ceiling racks at this point, and growing). Late last week, one of the system started misbehaving. Nightly tests were randomly failing for no apparent reason, but only when run on the GPU. My first thought was that it was the graphics card that went bad. So we swapped that out. No love, the tests continued to at random. Since the failures were only on the GPU, at first I was really puzzled at what could be the problem, unless somehow the PCI-E slot in the motherboard was bad. Then it occurred to me that perhaps the power supply was the culprit, and was slowly dying, and not providing enough or reliable power when the GPU was grinding away for hours on end. This morning I swapped out the power supply, and that fixed it completely. I guess the lesson to learn here is if you have a system that is behaving erratically, and all the obvious culprits check out (RAM, etc), consider that the power supply might be the problem.
In other news, we're searching for a manager for the CUDA QA group. Thus far, Ian has been the umbrella manager for all the CUDA folks, but the size of the group has increased significantly over the past year (yes, its been just about a year since i joined), and one manager can't really scale when there are an increasing number & size of subgroups. So after discussing it with me, and getting my agreement we started searching for a QA manager. Ian was quick to state that he didn't need or want someone to be watching over us, it was just that he really was stretched too thin to stay on top of everything that impacted QA. I interviewed (phone screened actually) a candidate today, and this person was marginal at best. Actually, he wasn't horrible, but there was nothing that I'd say stood out as noteworthy either. There is something a bit creepy about interviewing for your own manager. In a way, its a good thing, since I certainly want to have input on who gets the job, but at the same time, being a good manager is a very non-abstract thing, and I'm finding it challenging to come up with good interviewing material.