Computer Wars

PC open heart surgery - the mobo hanging out

Last spring I had a series of problems with my PC. These were very nasty. I would start my PC and it wouldn't boot. The only solution would be a complete system reinstall. Not practical. After about of month of hard work, I finally fixed it by getting a new SATA cable for my disc drive. The connector on the old cable just didn't feel like it seated solidly. I replaced it with one with a spring clip on the connector and that solved the problem. Seems a flaky cable to your system disc can result in trashing Windows. Not too much of a surprise.

Then this spring I got Windows Vista. It wasn't as bad as some people claim in the press but there were still issues with system crashes and compatibility. But by this fall Vista updates, game patches and especially the video drivers from vNidia seemed to catch up with most of the problems. I would still see a crash maybe once a month. It did teach me to get in the habit of saving my work (or my game) frequently.

This fall I decided to start gaming again. I got a new 24 inch monitor for my birthday with 1920x1200 resolution, so I wanted to play something newer with fancy graphics that would take advantage of my new display. I tried Titan Quest, an action-RPG. It's just a Diablo clone, but a good one. If you are going to copy a game, hey, that's a good one to copy. It was a lot of fun. But now my problems started. I couldn't even get through an evening playing without a system crash. And this game does have a flaw in my opinion (or at least in my circumstances). The SAVE system is not under user control. So whenever the system crashed there was a good chance I would have to start the game over.

Ok, it's just this game, right? So I spent about a week doing Vista updates, DirectX updates, game updates, video driver updates, and every other computer voodoo ritual I could think of. No success. After every attempt I would get a few hours in on the game. Just about the time I was starting to hope that it was ok KABOOM - system crash. Well you have to admit that the Microsoft guys have a sense of history for keeping the Blue Screen Of Death.

Old board - note the silver heat sink left of center with the MSI label on it

This is where I got into serious troubleshooting mode. The BSOD indicated that the crash was caused by a hardware problem. That is NOT a good thing for an intermittent problem. My first suspect was the memory system. I had done a memory upgrade a while back from 1GB to 2 GB. And I had used two different manufacturers because I bought inexpensive (ok - cheap) add on memory at Sim Lim Square. Mixing memory from different vendors can cause problems sometimes if the timing is on the edge. So out came the extra GB. That seemed to help - for a couple of hours.

Then it was into the BIOS. My motherboard is an "enthusiast" motherboard. Enthusiast is a euphemism for crazy people who make a hobby of overclocking their PC's to get every last possible drop of performance out of them. This is a strange activity because it basically consists of taking a system that is working well and tweaking it until it doesn't work well, then spending a lot of time trying to fix it. Strange. But while my problems were not self-inflicted, they were proving to be quite troublesome. I spent lots of time learning about DDR-2 RAM timing (just what IS the best setting for the CAS latency on these RAM chips?). But after about two weeks of struggling with the memory my system seemed to just keep getting sicker and sicker.

This was a difficult brand of troubleshooting. I didn't really have any tools to get in and see what was going on, either hw or sw. So I had to rely on macroscopic symptoms and try to match them to the experiences that other people reported on the web. I guess it must be a lot what it is like to be a doctor - an old fashioned GP. Fever? Body ache? Serious hemorrhaging? Oh you must have ebola.

New board has an elaborate (expensive) copper heat pipe and heat sink

For those who aren't engineers, let me say that intermittent problems are not easy. My processor is an AMD 4200 which runs at a basic clock speed of 2.2 GHZ. That's 2,200,000,000 clock cycles per second. Now my quick arithmetic tells me that if it runs for four hours before it has a failure, it has done over 30 trillion cycles successfully before it got an error. That means it doesn't happen very often. To put that in perspective, consider trying to find one spelling mistake in a book. If an average paperback novel has 400 words on a page, and is 500 pages long (the long books my wife likes to read), you would have to read as many books as would be in 10,000 collections the size of the US Library of Congress to find one spelling mistake. That's an error rate of one in 30 trillion.

All this digital stuff is fine but if it's intermittent hardware it sounds analog to me. So what about the power supply? Sure enough, the symptoms of a bad power supply are often very mysterious, very intermittent, and very inconsistent. So the next step was major surgery to replace the power supply. I wonder if it means I'm a geek if I admit that I had an extra brand-new PC power supply sitting around (doesn't everyone?). That helped. Now I could go several days without a crash. But it still wasn't right.

Now this was all pretty mysterious. I had owned this computer for six months before moving to Singapore and it didn't have any problems (although I didn't use it as heavily). And it really seemed to be sensitive to the application. Just running everyday stuff like email and IE didn't seem to cause a problem as much as running a graphics-intensive game. And what else had changed. I bought a bigger display (from 1280x1024 to 1920x1200). So the graphics load was much greater. And oh yeah, I moved to Singapore where it is always hot and humid.

I did a lot of research on the web, and believe me there are a *lot* of places to look. The more research I did, the more I read about a heat problem on the motherboard that I have in my PC. Apparently the Northbridge (the part of the chipset that handles the processor interface to the graphics system) runs really hot on my mobo. So hot that MSI did a silent roll of the board about six months ago to add a much better heat sink. Hmm. Although I found places at Sim Lim Square that carried my motherboard, I wasn't sure that just replacing it was the answer. I decided to do some major heat dissipation improvements. I bought a monster Northbridge cooler with built in fan and substituted it for the crappy (yes, when I saw the heat sink it was just a crummy aluminum one) heat sink on the Northbridge. Then I added an extra case fan to blow air right on the graphics card and Northbridge cooler. More circumstantial evidence - the Northbridge was right next to the (major heat generator) graphics card.

The best solution - a new HP Blackbird for Christmas

And that mostly fixed it. I still get about one crash per month. I think that is due to the fact that when I changed the heat sink, the surface of the Northbridge package felt very uneven. So I am not sure that the contact with the heat sink is as good as it can be. Overclockers actually polish the IC packages of major components and their heat sinks to get the best possible thermal contact. I decided that was too much work and I didn't want to risk damaging it. But I had spent about two months and an incredible amount of time on the problem. Good thing I had an extra PC (even if it does still run XP) and was able to use that while my regular PC was sick. (No, I'm not a nerd. Everyone has a spare PC at their desk in case their main system has problems.)

But my motherboard still isn't quite right. And I really need more than 1GB of memory with Vista, especially for cool games. And I need a new graphics card because the one I have can't really handle the size of display that I have. Not too long ago HP bought a company that specialized in high end gaming PC's and came out with a totally awesome gaming rig that has been getting fantastic reviews everywhere. And it is just too cool. The final answer is obvious - get a new PC. So I have been really, really good because I am hoping to get a new HP Blackbird for Christmas. (As I write this, UPS tracking shows that my new Blackbird is somewhere over the Pacific Ocean between Anchorage Alaska and Taiwan. The estimated delivery is the day after Christmas.) You can check it out at THAT should be the end of my heat problems with PC's.